Why do webforms often refuse non-ASCII characters?

Phil Smith III lists at akphs.com
Thu Jan 30 16:36:02 CST 2025


I meant that André's display name shows up as Andre, without the accent. Yours shows as Slawomir Osipiuk, no velarized L. Are you seeing an accent on André in your MUA?

Oh, wait, I dug into the raw SMTP transaction, interesting: your From: is encoded as
	From: =?utf-8?q?S=C5=82awomir_Osipiuk  
but is STILL displayed in both Outlook and Aqua Mail (the latter on Android) as
	From: Slawomir Osipiuk
The =?utf-8?q is RFC 2047/2231, "encoded word extensions".

However, André's is just
	From: Andre Schappo via Unicode <unicode at corp.unicode.org>
so no wonder we don't see an accent. Still doesn't explain your missing slashed-L, of course.

Digging further, empirical evidence is that Outlook renders "B" encoded UTF-8 correctly, at least if it's completely non-Western European characters: I have plenty of stored spam that's Chinese, and those are encoded as =?utf-8?B?xxxx... as opposed to the =?utf-8?q?xxxx... that your headers use.

I found https://dogmamix.com/MimeHeadersDecoder/ but it seems to support the B format only, so I'm concluding that the q format is somehow not as "real" despite dating back to 1996, and despite RFC2047 saying:
"Nevertheless, a mail reader which claims to recognize 'encoded-word's MUST be able to accept either encoding for any character set which it supports."

One thing is for certain: this is once again proving that something "trivial" really isn't!

-----Original Message-----
From: Unicode <unicode-bounces at corp.unicode.org> On Behalf Of Slawomir Osipiuk via Unicode
Sent: Thursday, January 30, 2025 1:06 PM
To: Phil Smith III <lists at akphs.com>; Andre Schappo <A.Schappo at lboro.ac.uk>
Cc: unicode at corp.unicode.org
Subject: RE: Why do webforms often refuse non-ASCII characters?


On Thursday, 30 January 2025, 11:42:42 (-05:00), Phil Smith III via Unicode
wrote:

 >  Or is the list server doing this to you?

It doesn't seem to have a problem with me, nor with the originator of this thread. 




More information about the Unicode mailing list