Batch conversion from "normal" Unicode text to NCRs

Tue Nov 12 11:52:41 CST 2024

Giacomo Catenazzi wrote:

> I'm not sure NCR is the best way to go (also decades ago): it is just
> a numeric representation and not semantic (as with other HTML
> entities), and so finding problems may become difficult.
>
> In my opinion, you should remove all NCR, else it would be a nightmare
> to check wrong encoding (maybe some of NCR where in Latin1, and some
> in Unicode, and the problem often we have double encoding). Also it
> makes difficult to correct spell. And also it is more simple to handle
> for people with all kind of experience. (Now UTF-8 can be used with
> all tools).
>
> So I would try to transform text as UTF-8 without NCR (web now is
> default UTF-8).

I’ve noticed in 26 years on this mailing list that there is often an irresistible desire, when someone posts that they need a solution to problem A, to respond that they really shouldn’t do that, but that they should solve problem A′ or problem B instead.

It’s worth noting that António said he was asking this question for a friend, i.e. the person who actually needs to do this is not here to defend himself.

All modern editors, browsers, fonts, etc. support Unicode today more than ever. But it will always be the case that some people are forced to work with old tools or standards, and need a workaround, unpalatable though that may be. I am encountering this at {dayjob} myself.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org