Expressing any Unicode character using Morse code

Sławomir Osipiuk sosipiuk at gmail.com
Mon Aug 7 16:51:32 CDT 2023


Compactness is of great benefit in Morse Code. I would therefore recommend against any padding or necessitating any additional character to specify length, or indeed worrying about "metadata precision" generally. For the same reason I would also use some flavour of base32 (I prefer Cockford's over the RFC, though that detail doesn't matter so much). This allows all planes except 16 to be encoded using only 4 Morse letters in the sequence.


The fundamental idea of a "unicode character introducer" sequence is solid. In the spirit of Morse shorthand, I recommend a simple concatenation of "U" and "+", that is the sequnce "..-.-.-." treated as a single letter, without spaces. This would be followed by the base32 sequence, made as short as possible, and terminated with a word-space.


Thus we have:


羽 (U+7FBD):  ..-.-.-.   --..   -..-   -..-  (U⁺ZXX)
🫥 (U+1FAE5):  ..-.-.-.   ...--   -.--   --.-   .....   (U⁺3YQ5) 


Hopefully I did not mess those examples up, but I think the point gets across regardless.


In most cases, the ambiguity of whether the terminating word-space should be read as a word-space or letter-space (i.e. the current word continues following the unicode character) can be determined contextually. However, if absolutely necessary, another plus sign can be added to the sequence indicating word-continuation (i.e. the terminating space should be read as a letter-space).


Cheers,
Sławomir Osipiuk

On Tuesday, 01 August 2023, 10:16:41 (-04:00), William_J_G Overington via Unicode wrote:



https://punster.me/serif/viewtopic.php?id=455





William Overington




Tuesday 1 August 2023


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20230807/6bfc6d69/attachment-0001.htm>


More information about the Unicode mailing list