Ask Professor Puzzler
Do you have a question you would like to ask Professor Puzzler? Click here to ask your question!
Kelly from the United States, has some questions that relate to our "Secret Messages and Codes" reference pages, about how different types of "things" get converted into hexadecimal and/or binary. This post will be an attempt to explore encoding a bit more.
In your computer's web browser, every color is made up of three numbers. Those three numbers are the red component of the color, the green component, and the blue component. Each of those numbers can range from 0 to 255, with 0 meaning, "Don't use this color," and 255 meaning "Give it all you've got of this color!" and all the numbers in between representating different gradations of that color.
So, for example, you could have a color represented like this: rgb(0,255,0), and that color would be green. You could represent another color like this: rgb(255,0,255). That color would be purple (full-blast red and blue, without any green). Or rgb(128,128,128). That color is equal amounts of all three components, which gives you gray.
rgb(255,255,255) is white, and rgb(0,0,0) is black.
However, that's not the only way to write colors for the computer to understand - you can take those three color components and write them in hexadecimal to tell the computer what the color is.
For example, if a website designer wanted to tell the browser to paint the screen dark purple, the designer could give the browser this number:
To see how this is purple, split it up into three 2-digit hexadecimal numbers:
#44, #00, #44
Since #44 = 68 (because #44 = 4 times sixteen, plus 4 more), and #00 = 0, this is identical to rgb(68,0,68).
Encoding numbers - either into binary or hexadecimal - is done by the process outlined on our reference pages. It's a structured mathematical process by which we determine the largest power of the base (two or sixteen, for binary or hexadecimal respectively) and then work backwards to determine the digits of the number in a new base.
If I'm programming a computer, and I want the computer to multiply something by 51, my program includes the number 51, but the computer converts that to 110011binary before storing it, because the computer does all its calculations in binary, and all its memory storage is binary.
Do you know what a grapheme is? It is defined as follows: "A grapheme is the smallest unit used in describing the writing system of any given language." Graphemes are the symbols we use to represent meaning.
Graphemes include letters, numbers, and all our punctuation marks - basically, all the characters on your keyboard. Graphemes, like numbers, need to be converted into a numerical language the computer can understand if we want them to be stored in the computer. Technically, they are stored in binary, but we often use hexadecimal to represent graphemes.
But how in the world do you convert something like "&" to a number? There isn't a mathematical process you can use to do that, is there? No, there isn't. There's also no mathematical process for converting the letter "A" to hexadecimal. And, to make your head spin a little, there's also no mathematical process for converting "1" into hexadecimal.
Notice that I put the number "1" in quotes. Because it's not the number 1, it's the grapheme for the number 1.
So if there's no good mathematical method for doing this conversion, what do we do? We just create a table of values, and arbitrarily assign a number to each grapheme. For example,
"A" = 41
"B" = 42
"C" = 43
SPACE = 20
"1" = 31
"2" = 32
"&" = 26
Note that all of these are hexadecimal values. It may be strange to wrap your brain around the fact that "1" = 31. Mind you, I'm not saying 1 = 31. I'm saying "the grapheme for the number 1 has a value of 31 (hex)." If I tried to tell the computer that one equals 31, I think it would have a meltdown!
Incidentally, in computer programming, we don't call them "graphemes" - we call them "characters." And we don't limit ourselves to visible symbols; we also have character codes for things like the backspace key and the tab key, and the arrow keys.
Syntax Determines Encoding
Here's where we finally get around to answering Kelly's questions, which are along the lines of "If I enter 'PURPLE' in the hexadecimal encoder, why doesn't it give me the color values you describe? And If I enter the number 52 in the binary encoder, why doesn't it give me the value you say it will?"
And the answer is: Our encoder is an encoder for graphemes, rather than an encoder for colors and numbers.
So when you entered "PURPLE," you thought you were saying to the computer, "Show me the hexadecimal value for the color purple," but really you were saying, "Show me the hexadecimal values for the graphemes 'P', 'U', 'R', 'P', 'L', and 'E'."
Similarly, when you entered "52," you thought you were saying to the computer, 'Show me the binary value of the number 52, but really, you were saying, "Show me the binary values for the graphemes '5' and '2'."
The key is, we tell you "enter some text" - whenever we are entering "text," we are entering graphemes for the computer to deal with.
In computer programming, and in web design, we have to be able to specify how we want to have the computer interpret the content we provide - whether we want the computer to interpret as colors, as numbers, or as text. This is why programming languages have syntax, or programming "grammar" - the grammar of a programming language helps the computer determine how we want to have things understood.
It's just like English grammar, in a way. Consider the word "content" used in the paragraph above. "Content" can be either a noun or an adjective ("I added content to the page," vs. "I am content with this page."). But when you read the sentence above, you probably didn't even hesitate to understand how I intended you to interpret the word "content." Why? Because the structure/grammar/syntax of the sentence made it clear how the word was to be interpreted.
In the same way, in the world of computer programming and web design, we use syntax to indicate how we want information to be interpreted. One of the simplest rules is: If we put it in quotes, we want it interpreted as graphemes. Of course, nothing can be that simple. A quotation mark is a grapheme, so what if you want a quotation mark inside the text? How does the computer know whether to interpret that quotation mark as the end of the text, or as part of the text? The rules can get messy!
So for our encoder, since we've said, "enter some text," your entire entry in the textbox is treated as content to be interpreted as graphemes.
I hope that lengthy explanation added some clarity for you! Thanks for asking.
Here's another blog post that explores encoding some more: Letter codes and binary.