Unicode String Models

Mark Davis ☕️ via Unicode unicode at unicode.org
Sun Sep 9 03:00:27 CDT 2018


Thanks, excellent comments. While it is clear that some string models have
more complicated structures (with their own pros and cons), my focus was on
simple internal structures. The focus was also on immutable strings — and
the tradeoffs for mutable ones can be quite different — and that needs to
be clearer. I'll add some material about those two areas (with pointers to
sources where possible).

Mark


On Sat, Sep 8, 2018 at 9:20 PM John Cowan <cowan at ccil.org> wrote:

> This paper makes the default assumption that the internal storage of a
> string is a featureless array.  If this assumption is abandoned, it is
> possible to get O(1) indexes with fairly low space overhead.  The Scheme
> language has recently adopted immutable strings called "texts" as a
> supplement to its pre-existing mutable strings, and the sample
> implementation for this feature uses a vector of either native strings or
> bytevectors (char[] vectors in C/Java terms).  I would urge anyone
> interested in the question of storing and accessing mutable strings to read
> the following parts of SRFI 135 at <
> https://srfi.schemers.org/srfi-135/srfi-135.html>:  Abstract, Rationale,
> Specification / Basic concepts, and Implementation.  In addition, the
> design notes at <https://github.com/larcenists/larceny/wiki/ImmutableTexts>,
> though not up to date (in particular, UTF-16 internals are now allowed as
> an alternative to UTF-8), are of interest: unfortunately, the link to the
> span API has rotted.
>
> On Sat, Sep 8, 2018 at 12:53 PM Mark Davis ☕️ via Unicore <
> unicore at unicode.org> wrote:
>
>> I recently did some extensive revisions of a paper on Unicode string
>> models (APIs). Comments are welcome.
>>
>>
>> https://docs.google.com/document/d/1wuzzMOvKOJw93SWZAqoim1VUl9mloUxE0W6Ki_G23tw/edit#
>>
>> Mark
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180909/3cb75f48/attachment.html>


More information about the Unicode mailing list