A file contains text data and binary data ... Is it a text file or a binary file?

Doug Ewell doug at ewellic.org
Mon Sep 14 15:08:23 CDT 2020


Roger L Costello wrote:

> A file contains a long series of text data and at the end is binary
> data. The binary data is not encoded as base64 text or anything like
> that. It is raw, unfiltered, unencoded binary data.
>
> Is it a text file or a binary file?

I seem to remember a question about this distinction some months ago. IIRC it devolved into a discussion about LF versus CRLF and "text mode" versus "binary mode" file transfers, which of course is not what is (or was) being asked.

> A colleague argues that it may be legitimately treated as a text file.
> After all, it can be opened in a text editor.

You can try to open any file inside a text editor. What the text editor does, display it in a meaningful way or show binary garbage or decline to open, is another matter.

> The text editor might display odd-looking characters such as this:
>
> ÿª¼ T
>
> But that is harmless.

As long as any changes the text editor might make are not saved back to the file. Many text editors reformat tabs into spaces or vice versa, remove trailing spaces in lines, convert between LF and CRLF, and so forth. If the file is not intended to be text for human consumption, these can be breaking changes; if the file has arbitrary binary content, they will be.

> Is there a practical, real-world problem with treating it as a text
> file?

Depends on what "treating" means. And I still think this is a false dichotomy.

--
Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org





More information about the Unicode mailing list