Need reference to good ABNF for \uXXXX syntax

Martin J. Dürst duerst at it.aoyama.ac.jp
Wed Apr 14 18:50:43 CDT 2021


Hello Doug,

On 2021-04-15 01:41, Doug Ewell via Unicode wrote:
> Is anyone aware of an existing RFC or other specification that includes complete, correct, and clear ABNF for Unicode escape sequences using the UTF-16 encoding scheme?
> 
> Examples:
> \u0041
> \u3042
> \uD801\uDC02  (NOT: \U0001042A)
> 
> This type of sequence is described in Section 6.3 of RFC 5137, but that RFC does not recommend this syntax and does not include ABNF for it.
> 
> "Correct" implies, for instance, that the ABNF excludes unpaired surrogates.
> 
> To be clear, I'm NOT looking for someone on this list to contribute their own code, but rather a pointer to code that is already published, and easy for another document, such as an I-D, to reference.

So I guess you are looking for something like the regular expression on
https://www.w3.org/International/questions/qa-forms-utf-8, but for the 
above syntax (rather than byte sequences in UTF-8) and in ABNF.

The closest I was able to come up from memory may be 
https://tools.ietf.org/html/rfc5137, but it's not exactly what you want. 
I'd guess it might be quicker for you to put something together on your 
own (and then maybe run it by this list).

Regards,   Martin.


> --
> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
> 


More information about the Unicode mailing list