global password strategies

Tex textexin at
Thu Apr 7 22:30:18 CDT 2022

Thanks for this Martin. Yes, the list is surprisingly quiet.

The industry is increasingly expanding to users that have no idea about Latin characters or digits.

Also, there are some apps that are serving users that are not computer literate. They may be handed a tablet to enter information or similar scenarios.
So they can't be expected to know to not enter Kanji for a password.

Restricting to ASCII can make it hard to remember for some users. I liken the experience to network passwords that are lengthy hex strings which are totally unmemorable.

The issue of control or other invisible characters is more problematic. If a user requires them for correct spelling restricting them makes the passwords awkward.
And the harder problem is to define the correct list of characters  to restrict.

Yes normalization is problematic too.

Yes, writing direction is a lesser problem.

My inclination is to do nothing on the software side, no restrictions, no normalization, when displaying passwords setting direction to LTR (for stability and consistency).
I think trimming leading and trailing white space is reasonable. Having a length requirement is enforcing good practices that protect both the user and the software provider.

And as you say, telling the user (and the developers) whatever you do, make sure you always do exactly the same thing is probably the best we can do.
For developers this means make sure you consistently don’t apply any functions between accepting the text and encrypting it.

The downside is that some users will have problems when they use a different system or go through upgrades of their input methods.

Almost makes me a believer in fingerprint ID, retinal scans, embedded body chips, etc.


-----Original Message-----
From: Martin J. Dürst [mailto:duerst at] 
Sent: Thursday, April 7, 2022 5:38 PM
To: Tex; unicode at
Subject: Re: global password strategies

Hello Tex,

I'm surprised I haven't seen any answers to your post yet, I think it's 
a very interesting and important topic.

On 2022-04-05 08:23, Tex via Unicode wrote:
> What is the modern recommendation for globalization of passwords?
> 1)      If your application (web, mobile, desktop, etc.) is used worldwide, which characters do you allow or restrict?

I don't have an example of an own application where I made such 
decisions (in most cases, such decisions are made at a framework/library 
level). But in Japan at least, nobody expects to use anything other than 
ASCII in passwords. There are two interrelated reasons for this:
1) Kanji, Hiragana, and Katakana would require conversion, which would 
mean users have to visually check whether they got the right character. 
That's not a good idea for passwords.
2) Conversion choices get stored on the user's system to make future 
choices easier, but that would establish a side channel. An attacker may 
get access to that data, and when comparing before/after, can narrow 
down the choices for passwords considerably.
I'd expect this to at least apply for Chinese, too.

I'd also guess that many password-related libraries restrict input to 
ASCII. But with the deep penetration of smartphones around the world, 
the need for non-ASCII passwords is definitely increasing. As we are 
working on giving people fully non-ASCII email addresses, we shouldn't 
ignore passwords.

> 2)      How do you deal with writing direction?
> My concerns are that confirming and displaying a password might look different depending on how well the browser or OS implements RTL writing direction or features like dir=auto. A user may then not be able to log in because they are instructed to type it in a way that is inconsistent with what they have seen on the screen.

This is definitely a problem, but maybe not such a serious one. On such 
a system, the user may be used to such inconsistencies. The user knows 
what characters they intended to typed, in what order. When they do a 
visual check, they don't need to verify the order, they only need to 
verify character identity. On smartphone, there are also many password 
input methods that only show the last character.

> 3)      Do you allow control or other invisible characters that a user may be used to typing in certain phrases? If these are allowed, how to indicate to the user that they have been used?

I'd just say the less allowed, the better.

> 4)      Also, should passwords be Unicode normalized? Seems damned if you do and if you don’t. Do text input methods generate test the same way or is it possible for a user to create a password on one system and then not be able to log in on another device?

The Mac used to do decomposition (NFD), and Windows uses composition 
(NFC), at least for file systems. I'm not sure this is still the case.

And there are other issues. In Arabic/Persian for example, there are 
different forms of the letter YEH, with different encodings, for things 
that may look the same on screen. An Arabic keyboard and a Farsi 
keyboard may produce different character codes.

> (Not normalization related, but I have experienced difficulty logging in to foreign systems, in hotels etc., when the keyboard is different and it takes a while to realize I have to abandon muscle memory and remember the actual password and look for the keys on the keyboard.)

The most important point is not "damned if you do and damned if you 
don't", but "whatever you do, make sure you always do exactly the same 

This starts way before you get into normalization. For example, do you 
remove leading/trailing white space? (The user may have copied the 
password from some text file. (That's not very good security, but some 
people still do it.))

Another example: Do you always have the same length restriction? I 
remember a case where I had set a password for a site, and on a sister 
site, it only worked after I tried to shorten it. What had happened was 
that when I set it, it got accepted but truncated without telling me, 
which worked well on the same site because the same truncation happened 
again. But the sister site didn't truncate, and this produced a 
mismatch. Make sure you tell people about such issues when they are 
setting a password, don't just 'fix' things behind the scenes.

Also remember that password encryption algorithms work on binary data, 
not on characters. For ASCII-only, that doesn't usually cause problems, 
but when working with Unicode, you want to make sure you have a single 
encoding before the encryption.

Please also note that "whatever you do, make sure you always do exactly 
the same thing" and using libraries or frameworks may not work well 
together, because different libraries/frameworks may do different things.

Regards,   Martin.

More information about the Unicode mailing list