Why do webforms often refuse non-ASCII characters?

Thu Jan 30 05:16:39 CST 2025

① Lack of Unicode education in University Computer Science departments. The basic functions of programming, storage, sorting, transmission, regex ...etc... are taught using ASCII text only. Therefore Unicode programming, for example, is not something Graduate Computer Science students even think about. The vast majority of Computer Science students that I have contact with have no knowledge of Unicode. So, I consider it no surprise there are so many systems that can only handle ASCII text. If Unicode education was core to the Computer Science curriculum I think the situation would be much different and there would many more systems which are Unicode friendly.

② Lack of interest/initiative/willingness/laziness/curiosity? Actually, this one I find a lot more difficult to explain. In digital communication, the majority of people write my name as Andre instead of André. Why? They see me write my name as André. Does the diacritic not register with them. With my students, at the beginning of the academic year, I ask them to write my name correctly and add that it is not difficult to write my name correctly. Following that message to my students, the majority will write my name correctly. Prior to that message they write my name as Andre. So, this general default, lack of interest in writing names correctly will be reflected in webforms.

André Schappo
________________________________
From: Unicode <unicode-bounces at corp.unicode.org> on behalf of Bríd-Áine Parnell via Unicode <unicode at corp.unicode.org>
Sent: 29 January 2025 15:39
To: unicode at corp.unicode.org <unicode at corp.unicode.org>
Subject: Why do webforms often refuse non-ASCII characters?

** THIS MESSAGE ORIGINATED OUTSIDE LOUGHBOROUGH UNIVERSITY **

** Be wary of links or attachments, especially if the email is unsolicited or you don't recognise the sender's email address. **

Hi everyone,

I'm hoping someone can help me out with some information. I'm doing some research into the refusal of accents in names (and other multicultural naming conventions) in online webforms. For example, in Ireland, there was a campaign recently to get the government to mandate acceptance of the fada in Irish language names (Seán instead of Sean). The campaign was successful, and the law changed in 2022, but it's only a requirement for public bodies, companies do not have to comply.

During the campaign, reports were made to the Data Protection Commissioner on the right to rectify about some of the companies, including Bank of Ireland and Aer Lingus. They defended themselves by saying that their systems couldn't accept fadas in names.

I'm assuming that its systems on the back end, such as database systems, that can't accept the so-called special characters. My question is, why would this be, given that Unicode would seem to solve this, and modern databases can use Unicode? Does anyone understand what the value is in continuing to retain legacy systems that only accept ASCII or some ISO variants? Or is there a different problem happening?

Appreciate any information that might shed light on this.

Thanks,

Bríd-Áine Parnell

Doctoral Researcher | Designing Responsible Natural Language Processing

School of Informatics | Edinburgh Futures Institute
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20250130/e15fa90f/attachment.htm>