Why do binary files contain text but text files don't contain binary?

Richard Wordingham via Unicode unicode at unicode.org
Fri Feb 21 10:17:09 CST 2020


On Fri, 21 Feb 2020 15:53:52 +0000
"Costello, Roger L. via Unicode" <unicode at unicode.org> wrote:

> Based on a private correspondence, I now realize that this statement:
> 
> 
> 
> > Text files do not contain binary  
> 
> 
> 
> is  not correct.
> 
> 
> 
> Text files may indeed contain binary (i.e., bytes that are not
> interpretable as characters). Namely, text files may contain
> newlines, tabs, and some other invisible things.
> 
> 
> 
> Question: "characters" are defined as only the visible things, right?

No, white space (e.g. spaces, tabs and newlines) is normally considered
to be composed of characters.  And then there are much harder to discern
things, such as zero-width spaces, line-break suppressors such as
U+2060 WORD JOINER, and soft hyphens (interpreted as line-break
opportunities).

Richard.


More information about the Unicode mailing list