Is the binaryness/textness of a data format a property?

J Decker via Unicode unicode at unicode.org
Fri Mar 20 09:22:45 CDT 2020


On Fri, Mar 20, 2020 at 5:48 AM Adam Borowski via Unicode <
unicode at unicode.org> wrote:

> On Fri, Mar 20, 2020 at 12:21:26PM +0000, Costello, Roger L. via Unicode
> wrote:
> > [Definition] Property: an attribute, quality, or characteristic of
> something.
> >
> > JPEG is a binary data format.
> > CSV is a text data format.
> >
> > Question #1: Is the binaryness/textness of a data format a property?
> >
> > Question #2: If the answer to Question #1 is yes, then what is the name
> of
> > this binaryness/textness property?
>
> I'm afraid this question is too fuzzy to have a proper answer.
>
> For example, most Unix-heads will tell you that UTF16LE is a binary rather
> than text format.  Microsoft employees and some members of this list will
> disagree.
>
> Then you have Postscript -- nothing but basic ASCII, yet utterly unreadable
> for a (sane) human.
>
> If you want _my_ definition of a file being _technically_ text, it's:
> * no bytes 0..31 other than newlines and tabs (even form feeds are out
>   nowadays)
> * correctly encoded for the expected charset (and nowadays, if that's not
>   UTF-8 Unicode, you're doing it wrong)
> * no invalid characters
>

Just a minor note...
In the case of UTF8, this means no bytes 0xF8-0xFF will ever be used; every
valid utf8 codeunit has at least 1 bit off.
I wouldn't be so picky about 'no bytes 0-31' because \t, \n, \x1b(ANSI
codes) are all quite usable...



>
> But besides this narrow technical meaning -- is a Word document "text"?
> And if it is, why not Powerpoint?  This all falls apart.
>
>
> Meow!
> --
> ⢀⣴⠾⠻⢶⣦⠀
> ⣾⠁⢠⠒⠀⣿⡁ in the beginning was the boot and root floppies and they were good.
> ⢿⡄⠘⠷⠚⠋⠀                                       -- <willmore> on #linux-sunxi
> ⠈⠳⣄⠀⠀⠀⠀
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20200320/83ad4072/attachment.html>


More information about the Unicode mailing list