Specification of Encoding of Plain Text

Richard Wordingham
Fri Jan 13 03:02:32 CST 2017

On Thu, 12 Jan 2017 21:03:29 +0100
Mark Davis ☕️:

> Latin is not a complex script,...

Unlike the common script, which notably has U+2044 FRACTION SLASH.

That statement is actually dubious from a typographical point of view.

> ...so it was only an illustration.

But it's good for looking for the non-obvious issues.

> A more serious effort would look at some of the issues from
> http://unicode.org/reports/tr29/, for example.

I don't think we want to have to repeat them all for each script.
Putting common-script punctuation and numbers in the regex will add
obscurity, and possibly be a maintainability issue.


