A file contains text data and binary data ... Is it a text file or a binary file?

Markus Scherer markus.icu at gmail.com
Mon Sep 14 14:55:37 CDT 2020


I think many people would call that a binary file with an internal
structure where the first part is text and the second part is binary.

I suspect there is a "magic sequence" of some kind as a separator.

On Mon, Sep 14, 2020 at 12:48 PM Roger L Costello via Unicode <
unicode at unicode.org> wrote:

> Is there a practical, real-world problem with treating it as a text file?
>

Depends on what you do. For some stuff, it will "work" or be harmless. In
other cases, tools will barf at you because they tried to validate it as,
say, UTF-8 text and found errors.
Or the result may not be useful. What if you count the number of lines of
"text" and the tool gives you arbitrary results based on what bytes it
happens to find in the binary part?

Why do you need to give it a single attribute of "text" or "binary"?
It's a bit like asking about the single language of a text that contains
paragraphs in different languages.

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20200914/59c7e793/attachment.htm>


More information about the Unicode mailing list