Need reference to good ABNF for \uXXXX syntax

Kent Karlsson kent.b.karlsson at bahnhof.se
Sat Apr 17 08:13:21 CDT 2021


(Going a bit further off the original topic of this thread…)

> 16 apr. 2021 kl. 22:54 skrev Carsten Bormann <cabo at tzi.org>:
> 
> 
>> On 16. Apr 2021, at 22:09, Kent Karlsson <kent.b.karlsson at bahnhof.se> wrote:
>> 
>>> SESC = "\" ( %x22 / %x2F / %x5C / %x62 / %x66 / %x6E / %x72 / %x74 /
>>>           (%x75 hexchar) )
>> 
>> 1) Why are some ”very plain letters in ASCII” given as hex escapes here? Esp. since the not so plain (it is used as an escape, which is the point here…) ”\” has not warranted a hex escape. (The grammar even uses it to escape ”, which is a bit ironic).
> 
> Because RFC 8259 does.
> 
> This is ABNF, so there are some peculiarities to be taken care of.
> Long form: %x2F and %x5C really should be “/“ and “\”, as I wrote before.
> %x22 is a convenient form to put a double quote into ABNF (there is no escaping in ABNF, which was invented around 1977, for RFC 733).
> 
> %x62 / %x66 / %x6E / %x72 / %x74 are of course “b”/“f”/“n”/“t”, which prefixed by “\” are popular white space escapes

\b is usually used for backspace (going back decades in tradition…). But backspace is NOT a ”whitespace” character at all. Neither when it was used to create bold (on typewriter type of terminals), overtyping to create combined characters (long since deprecated) or used as a command to erase character preceding current position, it has never been a whitespace character.

However, vertical tab (sometimes representable as a \v escape, as in C/C++, JavaScript, GoLang, PHP), nowadays used more for an ASCII representation of LINE SEPARATOR than for vertical tabulation, is usually regarded as a whitespace character.

/Kent K

> so you don’t have to use \uXXXX for them.
> Unfortunately, writing “b”/“f”/“n”/“t” in ABNF would invoke the default case-insensitivity of ABNF (think 1977 again).
> This could be written %s“b”/%s“f”/%s“n”/%s“t” with the ABNF extension documented in RFC 7405, but RFC 7159 (that became RFC 8259 later) was written before RFC 7405 (obviously).  Also, not using the extension slightly widens the set of tools that can be used with this ABNF.
> 
> I apologise for polluting this list with arcane details of JavaScript and ABNF, but those are the reasons this grammar looks like it does.
> 
> Grüße, Carsten
> 




More information about the Unicode mailing list