A neat description of encoding characters
梁海 Liang Hai via Unicode
unicode at unicode.org
Mon Dec 2 08:49:07 CST 2019
Grrr… It’s an okayish analog for binary numbers, but not really relevant to character encoding. Encoded characters are just assigned with integers, which could in turn be represented in any base.
The binary nature of computers’ way of storing numbers does not have much to do with how character encoding works—unless you really want to start explaining character encoding with those so basic ideas such as “What is electricity?”, “What is a computer?”, …
Best,
梁海 Liang Hai
https://lianghai.github.io
> On Dec 2, 2019, at 20:01, Costello, Roger L. via Unicode <unicode at unicode.org> wrote:
>
> From the book titled "Computer Power and Human Reason" by Joseph Weizenbaum, p.74-75
>
> Suppose that the alphabet with which we wish to concern ourselves consists of 256 distinct symbols. Imagine that we have a deck of 256 cards, each of which has a distinct symbol of our alphabet printed on it, and, of course, such that there corresponds one card to each symbol. How many questions that can be answered "yes" or "no" would one have to ask, given one card randomly selected from the deck, in order to be able to decide which character is printed on the card? We can certainly make the decision by asking at most 256 questions. We can somehow order the symbols and begin by asking if it is the first in our ordering, e.g., "It is an uppercase A?" If the answer is "no," then we ask if it is the second, and so on. But if our ordering is known both to ourselves and to our respondent, there is a much more economical way of organizing our questioning. We ask whether the character we are seeking is in the first half of the set. Whatever the answer, we will have isolated a set!
> of 128 characters among the character we seek resides. We again ask whether it is in the first half of that smaller set, and so on. Proceeding in this way, we are bound to discover what character is printed on the selected card by asking exactly eight questions. We could have recorded the answers we received to our questions by writing "1" whenever the answer was "yes" and "0" whenever it was "no." That record would then consist of eight so-called bits each of which is either "1" or "0". This eight-bit string is then an unambiguous representation of the character we are seeking. Moreover, each character of the whole set has a unique eight-bit representation within the same ordering.
>
More information about the Unicode
mailing list