Counting Codepoints

Richard Wordingham richard.wordingham at ntlworld.com
Tue Oct 13 13:44:39 CDT 2015


On Tue, 13 Oct 2015 14:08:28 +0200
Mark Davis ☕️ <mark at macchiato.com> wrote:

> On Tue, Oct 13, 2015 at 8:36 AM, Richard Wordingham <
> richard.wordingham at ntlworld.com> wrote:

> > Rather the question must be the unwieldy one of how
> > many scalar values and lone surrogates it contains in total.

> ​That may be the question in theory; in practice no programming
> language is going to support APIs like that.

And then exhibits such an API in Java!

> // for the last, could just call: *count = (int) test.codePoints().count();*

The challenge is rather one of expressing the task.

Perhaps: "What is the sum of the number of scalar values and the
number of lone surrogates in this Unicode 16-bit string?"

Maybe even: "What is the sum of the numbers of non-surrogate
codepoints, surrogate pairs and lone surrogates in this Unicode 16-bit
string?"

It's slightly less unwieldy in the context I actually want the
expression - "Go back for a grand total of x non-surrogate codepoints,
surrogate pairs or lone surrogates."

Richard.



More information about the Unicode mailing list