Breaking barriers

James Kass jameskass at code2001.com
Sat Oct 23 18:02:37 CDT 2021


On 2021-10-23 10:59 PM, Asmus Freytag via Unicode wrote:
> If you know the language, you can play with frequency data and try to use guess
> mapping tables. You'll probably get most of the singleton to singleton mappings
> correct, and then you could use various forms of trial and error, such as
> genetic algorithms to locate and assign n:m mappings.
>
> If the language is not known, but among a set of known languages for which there
> is existing data, I wouldn't be surprised to learn that you could adopt simple
> language recognition algorithms to be independent of encoding details, and
> either identify the actual language, or sharply limit the candidates.
>
> After that, you'd re-run the recognition algorithm with each candidate
> transcoding table.
>
> I'm not an expert on this, but I did cobble together my own toy language
> recognition code at one time, including using some genetic algorithm to improve
> its sensitivity. Fun stuff and  I was surprised how well that worked with only a
> few hours of effort.

That's a sophisticated approach.  For anyone lacking that level of 
expertise or not having quick access to language 
frequency/identification data, it might be more practical to locate the 
modified font, open it in one of those font editors which displays all 
the glyphs in the font on a grid, open up the Unicode charts, and start 
cross-mapping away.

Or the font editor step could be skipped with a program that simply 
displays all of the font-in-use glyphs and their corresponding mappings.

Here's a simple program that runs in dBASE III which does that:

* asctoo.prg
CLEA ALL
SET TALK OFF
SET ECHO OFF
SET BELL OFF

clea
aa = 1
ab = 1
co = 1
ac = 0

      DO WHIL aa < 256
      ca = STR(aa,3)
           IF aa < 10
           ca = SUBSTR(ca,3,1)
           ENDI aa
           IF aa < 100 .AND. aa > 9
           ca = SUBSTR(ca,2,2)
           ENDI
           IF ab = 23
           ab = 1
                IF ac < 21
                ac = ac + 6
                  ELSE
                ac = ac + 7
                ENDI ac
           ENDI ab
           IF aa # 7
           @ ab,ac SAY STR(aa,3) + "-" + CHR(&ca)
           ENDI aa
      aa = aa + 1
      ab = ab + 1
      ENDD aa
CLEA ALL
@ 21,74 SAY "Press"
@ 22,75 SAY "Any"
@ 23,75 SAY "Key"
SET CONS OFF
WAIT
SET CONS ON
* EOF() asctoo.prg ............................



More information about the Unicode mailing list