Breaking barriers
James Kass
jameskass at code2001.com
Sat Oct 23 18:02:37 CDT 2021
On 2021-10-23 10:59 PM, Asmus Freytag via Unicode wrote:
> If you know the language, you can play with frequency data and try to use guess
> mapping tables. You'll probably get most of the singleton to singleton mappings
> correct, and then you could use various forms of trial and error, such as
> genetic algorithms to locate and assign n:m mappings.
>
> If the language is not known, but among a set of known languages for which there
> is existing data, I wouldn't be surprised to learn that you could adopt simple
> language recognition algorithms to be independent of encoding details, and
> either identify the actual language, or sharply limit the candidates.
>
> After that, you'd re-run the recognition algorithm with each candidate
> transcoding table.
>
> I'm not an expert on this, but I did cobble together my own toy language
> recognition code at one time, including using some genetic algorithm to improve
> its sensitivity. Fun stuff and I was surprised how well that worked with only a
> few hours of effort.
That's a sophisticated approach. For anyone lacking that level of
expertise or not having quick access to language
frequency/identification data, it might be more practical to locate the
modified font, open it in one of those font editors which displays all
the glyphs in the font on a grid, open up the Unicode charts, and start
cross-mapping away.
Or the font editor step could be skipped with a program that simply
displays all of the font-in-use glyphs and their corresponding mappings.
Here's a simple program that runs in dBASE III which does that:
* asctoo.prg
CLEA ALL
SET TALK OFF
SET ECHO OFF
SET BELL OFF
clea
aa = 1
ab = 1
co = 1
ac = 0
DO WHIL aa < 256
ca = STR(aa,3)
IF aa < 10
ca = SUBSTR(ca,3,1)
ENDI aa
IF aa < 100 .AND. aa > 9
ca = SUBSTR(ca,2,2)
ENDI
IF ab = 23
ab = 1
IF ac < 21
ac = ac + 6
ELSE
ac = ac + 7
ENDI ac
ENDI ab
IF aa # 7
@ ab,ac SAY STR(aa,3) + "-" + CHR(&ca)
ENDI aa
aa = aa + 1
ab = ab + 1
ENDD aa
CLEA ALL
@ 21,74 SAY "Press"
@ 22,75 SAY "Any"
@ 23,75 SAY "Key"
SET CONS OFF
WAIT
SET CONS ON
* EOF() asctoo.prg ............................
More information about the Unicode
mailing list