Aw: Re: Re: Re: Re: Re: Re: Do you know a tool to decode "UTF-8 twice"

Buck Golemon buck at yelp.com
Thu Jan 30 12:15:47 CST 2014


While I understand your argument, my intent was to suggest that
"mysql-latin1" was *not* as good as some other name. Surely you're not
arguing that all names are equivalently good. Obviously "mnmmmnmn" is a
worse name than "mysql-latin1".

"Mysql" has less to do with the issue than "whatwg" or "web", since this
codec is necessary any time you want to reproduce browser decoding,
regardless of whether mysql is involved. I contend that mysql adopted this
implementation because it is so popularly used for web applications.

"latin1" is less directly accurate than "cp1252". While whatwg requires
that latin1 be an alias of cp1252, it does the same for ascii, and it
maintains that the canonical name is "windows-1252".

Ideally you'd want to update the name of your project, but if not, that's
your preference :)

However if I can get some consensus on a least-bad name ("web-cp1252" with
alias "web-windows-1252" seems to be in the lead), I plan to release such a
codec.

This issue also extends far beyond python. Any language that deals with the
web (ie all of them) and wants to be able to interpret (legacy) bytes
exactly as a browser would (admittedly a niche, but still important task)
needs such a codec. I believe unicode.org should eventually recognize such
a codec. Ideally it would reflect that this is the most-common
implementation of cp1252, but if I need to use a different name, that's
better than nothing at all.


On Jan 30, 2014 12:31 AM, Jörg Knappen <jknappen at web.de> wrote:

>  When you are looking for a *new* name for that encoding, why don't you
> just adopt the pythonese precedent
> mysql-latin1 ? It is as good or as bad as any other name, but has some
> footing just now.
>
> --Jörg Knappen
>
> *Gesendet:* Mittwoch, 29. Januar 2014 um 21:12 Uhr
> *Von:* "Anne van Kesteren" <annevk at annevk.nl>
> *An:* "Buck Golemon" <buck at yelp.com>
> *Cc:* "Markus Scherer" <markus.icu at gmail.com>, "Jörg Knappen" <
> jknappen at web.de>, "Frédéric Grosshans" <frederic.grosshans at gmail.com>,
> unicode <unicode at unicode.org>, unicode at norbertlindenberg.com
> *Betreff:* Re: Re: Re: Re: Re: Re: Do you know a tool to decode "UTF-8
> twice"
> On Wed, Jan 29, 2014 at 11:57 AM, Buck Golemon <buck at yelp.com> wrote:
> > Anne: Given that the intent is to implement exactly the whatwg spec, and
> the
> > group is currently called "whatwg" (even though it may eventually become
> a
> > historical artifact), is "whatwg-1252" most appropriate?
>
> It's up to you I suppose, but "whatwg-1252" just seems like long term
> it will lose its meaning. For the web "windows-1252" will always have
> this meaning due to deployed content, so "web-windows-1252" if you
> need to disambiguate from a different implementation of windows-1252
> makes sense to me.
>
>
> --
> http://annevankesteren.nl/
>
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140130/2e927f20/attachment.html>


More information about the Unicode mailing list