UTF-8 display (was: Re: a mug)
philip_chastney at yahoo.com
Tue Jul 21 07:49:52 CDT 2015
so the webmaster put up the page, declaring the charset to be UTF-8...
but what charset was being used by the guy who knocked out the HTML?
it could be more complicated than that: maybe the page was produced using UTF-8,
somebody reads the page using, say, WIndows 1252, and "converts" it to UTF-8
I'm sure, with a little effort, ever more complicated scenarii could be constructed
-- it's amazing what can be achieved when arrogance and ignorance are combined
On Tue, 21/7/15, Marcel Schneider <charupdate at orange.fr> wrote:
Subject: UTF-8 display (was: Re: a mug)
To: "UmeshPN" <umesh.p.nair at gmail.com>, "DanielBünzli" <daniel.buenzli at erratique.ch>
Cc: "UnicodeMailingList" <unicode at unicode.org>
Date: Tuesday, 21 July, 2015, 8:46 AM
On 13 Jul 2015, at
11:28, I wrote:
> The only time I saw UTF-8
like on the T-shirt, was when opening UTF-8 files that
didn't specify charset=UTF-8. The thing to do was to add
the charset in the file header.
Now I see that this issue is
much more tricky. I've just stumbled over a no-display
page instead of (or at the URL of)
where I read:
while the source as displayed
by Firefox shows:
(The markup comes from the header 1 tags.)
The trick is that the real HTML
file as saved by Zotero contains:
(with a U+2026)
and is encoded in...
Once changed this to utf-8, the
page displays correctly:
This may be why people are
puzzled with UTF-8 up to the end we've seen.
So I would like to present my
apologies to the List, and ask if anyone would help us to
know the real problem (browsers, web editors, or else) and
how to fix it. I don't think it's a mere HTML issue,
as it concerns the Unicode Transformation Format.
More information about the Unicode