<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi,</p>
<p><br>
</p>
<p>I can see three reasons for this:</p>
<p><br>
</p>
<p>1. As you say, modern databases can handle this. But not all
databases currently in use are modern. Especially state agencies,
banks and some other large corporations are often still using
pretty old systems, or need to stay compatible to someone else's
old system. A famous example are airline tickets: They all run
through one quirky old system that can't deal with anything but
ASCII letters, forcing many people to misspell their names when
booking a flight.</p>
<p><br>
</p>
<p>2. The second reason is what I would call "lazy validation". You
have to make sure your system isn't vulnerable to query injection,
code injection, and perhaps spoofing, i.e. you have to forbid some
characters that have a special function in whatever query
language, programming language(s) and markup language(s) you use.
Like e.g. ' for SQL, < and " for HTML and so on. If you forget
any of these, you have a huge security problem. So the easiest and
safest way* to ensure this is to whitelist just the characters
that you know to be safe, and while you can do that correctly
based on Unicode's character properties, I've also seen people
using way too simple regular expressions like /[A-Za-z0-9]+/ which
cause the problem you described.<br>
</p>
<p><br>
</p>
<p>* Apart from using libraries that already solve the problem
properly, of course. But surprisingly many people keep
re-implementing existing things for some reason.</p>
<p><br>
</p>
<p>3. Inconvenience for own staff: Even if a system can handle
"special" characters, they may still be a hassle to work with. I
once visited Japan with a friend whose name was Jürgen, and when
they typed in our names in their system, it took four people
discussing for ten minutes about how to insert the ü. Also
checking if things like names are correct and matching across
different documents is way harder if people can't easily read
them.<br>
</p>
<p><br>
</p>
<p>Kind regards,<br>
Alexander<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 29.01.2025 16:39, Bríd-Áine Parnell
via Unicode wrote:<br>
</div>
<blockquote type="cite"
cite="mid:AS8PR05MB83255037D11636CC4DB62AA1BFEE2@AS8PR05MB8325.eurprd05.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<style type="text/css" style="display:none;">P {margin-top:0;margin-bottom:0;}</style>
<div class="elementToProof"
style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi everyone,</div>
<div class="elementToProof"
style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof"
style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I'm hoping someone can help me out with some information. I'm
doing some research into the refusal of accents in names (and
other multicultural naming conventions) in online webforms. For
example, in Ireland, there was a campaign recently to get the
government to mandate acceptance of the fada in Irish language
names (Seán instead of Sean). The campaign was successful, and
the law changed in 2022, but it's only a requirement for public
bodies, companies do not have to comply.</div>
<div class="elementToProof"
style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof"
style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
During the campaign, reports were made to the Data Protection
Commissioner on the right to rectify about some of the
companies, including Bank of Ireland and Aer Lingus. They
defended themselves by saying that their systems couldn't accept
fadas in names.</div>
<div class="elementToProof"
style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof"
style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I'm assuming that its systems on the back end, such as database
systems, that can't accept the so-called special characters. My
question is, why would this be, given that Unicode would seem to
solve this, and modern databases can use Unicode? Does anyone
understand what the value is in continuing to retain legacy
systems that only accept ASCII or some ISO variants? Or is there
a different problem happening?</div>
<div class="elementToProof"
style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof"
style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Appreciate any information that might shed light on this. </div>
<div class="elementToProof"
style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof"
style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Thanks,</div>
<div class="elementToProof"
style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div id="Signature" class="elementToProof">
<div
style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<b>Bríd-Áine Parnell</b></div>
<div
style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div
style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 13.3333px; color: rgb(0, 0, 0);">
<span style="background-color: rgb(255, 255, 255);">Doctoral
Researcher | Designing Responsible Natural Language
Processing</span></div>
<div
style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 13.3333px; color: rgb(0, 0, 0);">
<span style="background-color: rgb(255, 255, 255);"><br>
</span></div>
<div
style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 13.3333px; color: rgb(0, 0, 0);">
School of Informatics | Edinburgh Futures Institute </div>
</div>
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336. Is e buidheann
carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an
Alba, àireamh clàraidh SC005336.
</blockquote>
</body>
</html>