Saddening Goat: Crocodilian scleroderma

crocodilian (adj.): belonging to the order of large reptiles, Crocodilia

scleroderma (n.): a chronic disease characterized by excessive deposits of collagen in the skin or other organs,

By the way, crocodiles are the closest living relatives to birds, since they both happened to evolve from dinosaurs.

Anyway, I felt like posting, so since I've been programming a bit of stegano stuff in Python recently, I think I'll move on from swords to steganography.

Steganography ('stegano' for short), for those who have no cryptography experience whatsoever, is probably best explained like this: Let's say you want to send a secret message to your friend, containing say, the word "crocodilian". This is known as the plaintext - it's the net information that you want to send to him. But the evil enemies want to know what you're sending to your friend too, and with their unlimited resources they can intercept the messager or mailman or telephone or e-mail or any other means of communication you can think of. So, naturally you encrypt it and turn it into ciphered text (often shortened to just ciphertext) by say, shifting every letter one letter forward: "dspdpejmjbo". Of course, your friend needs to know that everything is shifted one letter forward too, but presumably you arrange this ahead of time. Anyway, you now send this to your friend and it gets intercepted by the evil enemies. Now, the evil enemies notice something fishy - why would you bother sending your friend a string of gibberish letters like "dspdpejmjbo"? So, suspecting something, with their multi-million dollar budget they hire a crack team of researchers who manage to crack your code in a few years - and your secret has fallen!

Of course, one way to avoid this is to simply make the cipher trickier (two spaces forward!). But another way (and the way related to steganography), is to just not make them suspicious. We could do this, say, by making an English sentence where the first letter of every word corresponds to the letters in "dspdpejmjbo" - such as: "Do stupid peons derail planes entering jam made joyously by Olga?" When the evil enemies intercept this, they would, of course, find nothing unusual about the sentence and just let it pass through. After all, they can't spend a multi-million dollar budget analyzing every message that comes their way for hidden information.

So, in short, ciphering is hiding information; steganography is hiding the fact that you're hiding information.

Steganography is a pretty cool part of cryptography in that there are so many possible ways to do it. You might have noticed, that the sentence I constructed above is pretty fishy (i.e. they might eventually notice something was a bit peculiar about it). Also, even for a short sentence such as this one, you'd need almost an entire paragraph (if not more) to hide it all in that manner. So nowadays, most stegano stuff is done by hiding the information digitally in the insignificant layers of files.

Take the above black box, for example. If you copy it into Paint (or an equivalent) and play around with it for a while, you'll notice that it does contain the information "dspdpejmjbo". This is quite more subtle than "Do stupid peons derail planes entering jam made joyously by Olga?", since now you can hide a hidden message inside any block of uniform color. You can even hide the block too - you could include it surreptitiously at the bottom of a photograph or something.

But the evil enemies are getting smarter, and have just written a computer program that checks all picture files for blocks of uniform color, and then tries floodfilling the block with various colors to check for hidden text. So how do we get around them now? There's a very nice way to do this, but it is slightly more technical.

Of foremost importance is understanding how .bmp (or any other graphics extension of your choice, although they tend to be much more complex) files are stored. Luckily, with a hex editor it's not very hard to find out (if you don't know what a hex editor is, go learn to use a computer).

Open some arbitrary bitmap file (the one below, for example) in a hex editor.

If you do open it, you'll see lots and lots of bytes represented in hexadecimal. Of course, hexadecimal is always used because 16^2 is 2^8, so one byte goes to two hexadecimal characters. Similarly, ASCII ranges from 0-255 for the same reason; you can express one ASCII character with two hexadecimal characters or in one byte.

Now, if you're not used to hex cracking stuff, it'll probably look a little foreign. But if you play around, you should be able to figure out what most of the things are. For example, the first 2 bytes in a .bmp file are always "42 4D" - representing the ASCII "BM". This basically helps programs identify it is a bitmap (a nice trick is, if you are ever given a mystery file, open it in a hex editor and look at the first two bytes - they often identify the format of the file).

Then, bytes 3-6 denote the file size of the bitmap, in bytes (yes, there are only 4 bytes for this - we too hope that you don't feel like making a 4.3 GB bitmap file anytime in the near future). Now, if you look at the file size of something, you'll note that it's represented a bit oddly. By that I mean, they put the least significant bytes first.

What does that mean? It's basically akin to writing the number 7543 as 3457 - here the least significant digit, 3, appears first. Although, you don't need to worry about the bytes, so if you see "5E", that really means 5*16+14 = 94, not 14*16+5 = 229. Altogether, that means that something like "5E 26 00 00" actually represents the number 26*(16) + 5E = 9822 - not 24102, and definitely not 1579548672 (by the way, forgive the flipping between hexadecimal and decimal so often - only 26 and 5E are in hex there).

There's a lot more information encoded like this, like the length of the picture, the width of the picture, etc. However, we want to get to the interesting stuff - the actual picture. So skip ahead to byte A. This tells you the byte at which the actual image information starts. For my computer it's usually 36 - so the 54th byte. But it could change depending on the operating system, settings, the actual image, etc.

Now, you'll see familiar triplets of bytes - they encode the RGB values. So, if the 54th through 56th bytes are "00 FF 00" that means that the RGB of that pixel is "Red: 0, Green: 255, Blue: 0", so it's a green bit. (Actually, it's really the BGR values - the first "00" is the blue value - but we don't need to worry about this since we treat everything the same).

The observant person might have noticed that when I said "that pixel", I never really specified what pixel I was talking about. Now, you might intelligently assume that the first pixel would be the pixel in the top-left corner - but you'd be wrong. After all, if you look at the image, the pixel in the top left corner is cyan, not green. So hmm...

For some reason (hey, don't ask me) bitmaps represent the picture data from the bottom up - so the first pixel is actually the bottom-left corner. Then the next pixels are all along the bottom-most row, then they proceed to the second bottom-most row, etc. all the way to the top row.

Of course, that's not nearly annoying enough. To make things even better, they force the width to be a multiple of 4 - so you'll find that if you have 6 pixels in a row, you'll see the RGB's for those 6 pixels, followed by 2 "00 00 00"s.

Luckily, none of this matters to us, since we just want to hide "dspdpejmjbo" in the data. The first step is to translate that to an equivalent hex string: "6473706470656A6D6A626F00". Now, we're going to break up that hex string into triplets of characters: "647 370 647 065 6A6 D6A 626 F00"

Now what do we do? Well, consider the R hex value for some random pixel, say "A2". Now, this represents a red value of 162. Now, if we changed it to say, "A6", you probably couldn't tell the difference. But if we changed it to something like "62", you probably could - the red value has changed by 64.

We call the "2" the least significant bit (although it's really 4 bits, but shh). We basically are just going to replace the last 4 bits with one bit from our message. So for example, if our first pixel was "A2 B3 C4" it would get changed to "A6 B4 C7", since our first triplet is "647".

And when we do that, the new picture is:

Now, our message wasn't very large, so all of it managed to hide in the bottom row. You can notice the steganography in paint, since we are doing +/- 16 shifts to the color values. But nevertheless, it's a pretty nice way of hiding information. And I bet the evil enemies wouldn't suspect anything!

I'm going to go play Halo 3 with somebody now, so I'll stop now, but I'll post more about this later - like my program that automates the process, how you can actually detect stegano, and how to hide pictures within pictures.

Until then, ciao.

-squidout

Saddening Goat

February 17, 2008

Crocodilian scleroderma

1 comments:

Contributor Posts

Links

Blog Archive

The OFFICIAL Dilbert Widget

Google News - Sci/Tech