Unicode education in Schools
Philippe Verdy via Unicode
unicode at unicode.org
Thu Aug 24 13:19:10 CDT 2017
2017-08-24 19:17 GMT+02:00 Andre Schappo via Unicode <unicode at unicode.org>:
> Because there are many systems that can now handle BMP characters but not
> cannot handle SMP characters.
> One example being systems that use mysql utf8 (3 byte encoding) and have
> not yet updated to utf8mb4 (4 byte encoding)
Mysql's utf8 is known to cause severe problems, notably on wikis installed
by default with it: the presence of any non-BMP character (SMP or emojis
are now very frequent and available on almost all modern smartphones) in
the edited text will cause its **silent** truncation when uploading it to
the server (when it will save the text to the database) even if any unsaved
preview was correct. You will see the truncation when the page is loaded
Mysql's "utf8" should have been dropped since long and replaced by utf8mb4
or setup so that data send to an "utf8"-encoded database would cause a SQL
error that cannot be silently ignored with truncation (or it least it
should only cause the non-BMP characters to be filtered out, without
silently deleting everything that follows).
This is an old severe bug of Mysql (on the server itself) or in the
connection protocol, or internal filters used by Mysql client library, that
has caused many severe security issues (such as discarding logs or todo
lists, or loss of pending commercial transactions such as lists of payments
to process to a bank or truncated billings sent to customers, or loss of
contact address or name, or broken complete addresses for product delivery
to a customer, or missing items in a delivered box and lost products in the
middle of their routing).
This is a demosntration that not signaling encoding errors to an
application, or not clearly specifiying that an API may cause encoding
exceptions that must be caught and must not ignored in applications, can
hurt. Even if you use "utf8mb4" encoding errors are still possible and must
not be ignored as the final result will be unpredictable.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode