ICU data (only .ucm mappings for now) in Rust
Fredrick Brennan
copypaste at kittens.ph
Mon May 24 11:01:21 CDT 2021
Hello!
I am writing a font editor called MFEK in Rust which required me to explore the
ICU data.
I didn't want to use libicu's C binding just to get this data, so after some trial and
error I figured out how to include the data in a compressed form in the library, and
added an in-memory index for it.
The crate is called icu-data: https://docs.rs/icu-data/0.1.0/icu_data/[1]
crates.io page: https://crates.io/crates/icu-data[2]
For example, if you look at "glibc-IBM437-2.1.2", you get something back that
starts like this:
Encoding { metadata: {"mb_cur_max": "1", "mb_cur_min": "1", "subchar": "\\x1A",
"uconv_class": "SBCS", "code_set_name": "IBM437"}, codepoints: [Codepoint {
uni: '\u{0}', eq_type: Type0, bytestring: [0] }, Codepoint { uni: '\u{1}', eq_type:
Type0, bytestring: [1] }, Codepoint { uni: '\u{2}', eq_type: Type0, bytestring: [2] },
Codepoint { uni: '\u{3}', eq_type: Type0, bytestring: [3] }, ... states: [] }
I didn't have a need for anything but the UCM charset files at this time, but if
anyone is interested in adding a module for the other ICU data this is a place to
start.
Best,
Fred Brennan
--------
[1] https://docs.rs/icu-data/0.1.0/icu_data/
[2] https://crates.io/crates/icu-data
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210524/e187af06/attachment.htm>
More information about the Unicode
mailing list