UTF-8 display (was: Re: a mug)

philip chastney philip_chastney at yahoo.com
Tue Jul 21 07:49:52 CDT 2015


so the webmaster put up the page, declaring the charset to be UTF-8...

but what charset was being used by the guy who knocked out the HTML?

it could be more complicated than that: maybe the page was produced using UTF-8, 
somebody reads the page using, say, WIndows 1252, and "converts" it to UTF-8

I'm sure, with a little effort, ever more complicated scenarii could be constructed
--  it's amazing what can be achieved when arrogance and ignorance are combined

/phil

--------------------------------------------
On Tue, 21/7/15, Marcel Schneider <charupdate at orange.fr> wrote:

 Subject: UTF-8 display (was: Re: a mug)
 To: "UmeshPN" <umesh.p.nair at gmail.com>, "DanielBünzli" <daniel.buenzli at erratique.ch>
 Cc: "UnicodeMailingList" <unicode at unicode.org>
 Date: Tuesday, 21 July, 2015, 8:46 AM
 
 On 13 Jul 2015, at
 11:28, I wrote:
 
 > The only time I saw UTF-8
 like on the T-shirt, was when opening UTF-8 files that
 didn't specify charset=UTF-8. The thing to do was to add
 the charset in the file header.
 
 Now I see that this issue is
 much more tricky. I've just stumbled over a no-display
 page instead of (or at the URL of)
 http://www-01.ibm.com/software/globalization/topics/keyboards/physical.jsp
 where I read:
 Our apologies…
 while the source as displayed
 by Firefox shows:
 charset=utf-8
 Our apologies
 (The markup comes from the header 1 tags.)
 
 The trick is that the real HTML
 file as saved by Zotero contains:
 Our apologies…
 (with a U+2026)
 and is encoded in... 
 charset=windows-1252
 
 Once changed this to utf-8, the
 page displays correctly:
 Our apologies…
 
 This may be why people are
 puzzled with UTF-8 up to the end we've seen.
 
 So I would like to present my
 apologies to the List, and ask if anyone would help us to
 know the real problem (browsers, web editors, or else) and
 how to fix it. I don't think it's a mere HTML issue,
 as it concerns the Unicode Transformation Format.
 
 Best regards,
 
 Marcel



More information about the Unicode mailing list