Do you know a tool to decode "UTF-8 twice"

Tex Texin textexin at
Wed Jan 29 15:09:16 CST 2014

Since it isn’t cp1252 nor iso8859, perhaps call it whatwg-latin or whatwg-1.

If, or when, 1252 is updated to assign a character to an undefined
codepoint, it will be problematic to have them both refer to 1252.

For example, if a new currency symbol is added in Latin America, as has been
discussed from time to time.


Anyone writing decoders for the Whatwg encoding should also be on notice
that it is not necessarily a superset of 1252 going forward, and should
design for the potential distinction down the road.


I am tempted to suggest we call it “Whatwg-Not-your-fathers-1252” which also
would serve appropriate notice…




From: Unicode [mailto:unicode-bounces at] On Behalf Of Buck Golemon
Sent: Wednesday, January 29, 2014 11:57 AM
To: Anne van Kesteren
Cc: unicode; unicode at; Jörg Knappen; Frédéric
Grosshans; Markus Scherer
Subject: Re: Re: Re: Re: Re: Re: Do you know a tool to decode "UTF-8 twice"


Anne: Given that the intent is to implement exactly the whatwg spec, and the
group is currently called "whatwg" (even though it may eventually become a
historical artifact), is "whatwg-1252" most appropriate?


Norbert Lindenberg previously suggested standardizing some kind of


Do you most prefer the s/web-/cp/ pattern?


On Wed, Jan 29, 2014 at 11:53 AM, Anne van Kesteren <annevk at>

On Wed, Jan 29, 2014 at 11:22 AM, Markus Scherer < at>
> On Wed, Jan 29, 2014 at 10:21 AM, Buck Golemon <buck at> wrote:
>> I've been considering naming it cp1252-whatwg.
> It would be nicer to put the organization name first, such as
> or maybe better html-cp1252. That would be more like ibm-932 and such.

If you want to support more encodings than defines I suggest using the prefix
"web-". The organization may change and this is not tied to HTML.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list