Avoiding Source Code Spoofing

Thu Mar 3 03:02:16 CST 2022

Hi,

I think noting a couple things will resolve your concerns:

   - The group is not primarily structured around the thrust of any
   particular report. The report you refer to is what started a wider
   discussion leading to the formation of this group, but this group is not
   directly attempting to address problems posed in that report. If you look
   closely, the framing of the problem in this announcement is radically
   different from that in that report.
   - The second to last paragraph of the post should make it clear that
   disadvantaging users of these scripts is not on the table.
   - There is significant work being done already to consult implementers
   of compilers and tooling. As for consulting users; I'm sure that will be
   the case, it's just still pretty early in the process.

Thanks,
-Manish Goregaokar

On Thu, Mar 3, 2022 at 12:53 AM Eli Zaretskii via Unicode <
unicode at corp.unicode.org> wrote:

> > Date: Wed, 2 Mar 2022 14:51:44 -0800
> > From: announcements via announcements <announcements at corp.unicode.org>
> > Cc: announcements <announcements at corp.unicode.org>
> >
> > Unicode has convened a group of experts in programming languages,
> > tooling, and security to provide guidance and recommendations on how
> > to better handle international text in source code, as well as
> > providing code to help implementations.
>
> There was no address or place, neither in this announcement nor in the
> report to which it pointed, regarding where to send any comments on
> the issues raised by them, so I'm posting them here; apologies if that
> is inappropriate.
>
> First, I think the report fails to distinguish between legitimate use
> of RTL characters and controls, just because the program code has
> strings and/or comments with RTL characters; and the malicious use,
> where the intent is to spoof and mislead the recipients of the code.
> Such a distinction is important, because use of bidi controls that is
> legitimate in the former case is highly suspicious in the latter.  For
> example, any source code where the inherent directionality of a strong
> directional character was overridden, or where a weak/neutral
> character has an embedding level that's too high, should be suspected
> as potentially malicious.
>
> Second, I don't see in the Proposed Plan any activity to collect input
> from users and implementors of compilers, linters, and editors.
> Without collecting such input, I see no way that the work group will
> appreciate the real-life problems and issues that the developers and
> users of these tools are facing, and that could easily lead to
> recommendations that are hard or impossible to implement at least in
> some of these tools, and/or which could be disconnected from the real
> problems and practices.  For example, the idea of rendering bidi
> formatting control as "chits" will not solve the reordering issue in
> Emacs, where bidi reordering is performed _before_ the actual glyphs
> to present characters on the glass are fully known.  More generally,
> editors differ significantly in how they implement various features
> that support editing of program source, such as syntax highlighting
> and on-the-fly analysis of the source tokens; the recommendations must
> take these into considerations to be useful.
>
> Finally, I'm sorry to say, but the report is strongly biased in that
> it focuses almost entirely on the issues caused by visual reordering
> of bidirectional text and the bidi formatting controls in particular.
> While it does mention other issues that yield confusing program code,
> those few references read more as a lip service than anything else.
> OTOH, there's no real attempt to describe the legitimate needs of
> program source code intended for RTL languages and scripts, and
> without such description, with only the problematic (let alone
> malicious) use of bidi characters discussed in this and many
> referenced documents, which is exacerbated by the fact that many
> people don't really understand the UBA and the needs of RTL scripts,
> this and the future documents could lead to lopsided conclusions, like
> "let's disallow those problematic characters from program source
> code".  This isn't just theory: some compilers, evidently alarmed by
> the brouhaha around these issues, actually went ahead and started
> flagging the use of some of these characters in program source code as
> errors!  While such ridiculous (IMO) "solutions" in this or that tool
> could be dismissed as folly on the part of their developers, a
> document written and sanctioned by the Unicode Consortium which leads
> to similar conclusions would be a disastrous development, which will
> significantly hamper development of bidi-aware program development
> tools and disadvantage their users who work in RTL language
> environment.  I hope this is not how this (very important, IMO)
> initiative will end.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220303/8af01901/attachment.htm>