The previous Sunday I decided to play a little with graphical interpretation of files again. Graphical interpretation, or visualizations as one may call it, is a large topic, there are even some interesting sites dedicated to that, in which the authors present colorful bitmaps representing files, that are commonly made moving file bytes directly to Red, Green and Blue channels. However, in my case, the bytes will not be mapped to RGB, instead, I choose to map them to X and Y.

The assumption is, as always, simple: we take a white bitmap 256 x 256, and then, for each byte of the file, we take that byte, and the next one, and say that the first one is the X coordinate and the second one is the Y, and we darken a pixel a little in that place (X,Y), causing the place to be dark when there are many occurrence of given pair of bytes, and white if the pair does not occur to much / at all (so we have file_size - 1 byte pairs that create some image).

Because everything sounds simple, let's add some "hardeners":
1) let's use a logarithmic scale for the pair occurrence count being mapped into color
2) let's normalize the colors/occurrence count, so that the minimal occurrence (not always 0) would be always white, and the maximum occurrence would always be black
3) let's use some colors (sepia or something similar)

OK. Now, when we have the app ready, let's feed it all the files we can find on the hard drive, and see the results!

Let's start with some Vista system32 files:

localsec.dll
localsec.dll

This is a standard boring bitmap. However, not all bitmaps are so boring! Let's look at another one:

slcc.dll
slcc.dll

Some "spider legs" appear. Interesting... let's search some more...

ssBranded.scr
ssBranded.scr

The above object is a little more interesting, with visible tendencies. Some other files:

8point1.wav
8point1.wav

aurora.scr
aurora.scr

locale.nls
locale.nls

And, one of the most interesting DLL files I've found:

spwizimg.dll
spwizimg.dll

Huh! Is it a bird? Is it batman? No! It's neither a bird, nor batman, nor any other devil (if it would be a devil I'm sure someone would digg this post with a title like 'Hidden satan finally found in Windows Vista!' ;D). These are just BMP files in resources with some gradient-like bitmaps of buttons, and icons. It happens so that images like photos, drawings, etc, converted into BMP/RAW/TGA/something other without compression and fed to the described math formula give veeeery interesting results! Let's look at a couple of bitmaps representing gfx files:

some image

some image

some image

I've places a full gallery of interesting (imho) files is here: Full Gallery

The source code (ugly, as always) + executable: file2d.zip (ZIP SRC+BIN, 7kb) (BSD-style license)

If you'll find some interesting visualization, leave a link in the comments :) (also take a look at the Polish side of this blog in the comments to this post).

And thats it.

P.S. I've just realized that my blog was entered into the "CONFidence Security Evangelist" competition in category "A Polish-language blog about IT security". Huh ;) Thx guys, I'm really positively surprised ;) However, I admit that I think that my blog doesn't fit there well, since only about 25% things I write about are somewhat related to security ;)

P.S.2. A friend of mine has shown me a message he received on GG (a Polish-originating IM) that goes like "I love you... http://www.wyznanie.mx.tc". On the destination page (that looks almost pro ;p) the reader is convinced to send an SMS to get information about WHO loves him. Of course, at the bottom of the page there is an info about the price of such SMS, that costs over 23 PLN (thats over 5 EUR, and over 7 USD)! I'll just add that this message is send to everyone by a bot, and it's just a scam. Crazy idea, however I'm really interested in knowing how many people fell for this.

Comments:

2009-06-05 18:56:01 = qubodup
{
Hello,

Would you be ok with licensing this program under http://www.opensource.org/licenses/mit-license.php or some other open source license? If yes, I'd be super glad and ask someone to port it to gcc, so I can waste my time, analyzing my files on my linux machine :)

Cheers!
}
2009-06-06 08:02:48 = Gynvael Coldwind
{
@qubodup
Hi,
I've set the license to BSD-style (details here http://gynvael.coldwind.pl/?id=203).
Have fun ;>
}
2009-06-07 11:39:09 = qubodup
{
Wheeeeeeee! *squeals like a girl*
}
2009-06-07 19:04:22 = qubodup
{
Cool! A forum buddy created a linux (and supposedly windows-working) port: http://forum.freegamedev.net/index.php?t=msg&goto=20000
}
2009-06-07 20:12:07 = Gynvael Coldwind
{
@qubodup
Thx for the link! ;>
}
2009-06-15 05:18:18 = vade
{
Hi, this is really nice. A friend of mine pointed me to this page as Ive done some animated file visualization via a plugin I wrote for OS X using simple RGB mapping, however your idea to use the bytes as coordinates is quite nice. I may have to check that out. I did a movie also of some interactive memory viewing app. If you are interested, the URLs are here:

http://vimeo.com/2699248

http://vimeo.com/2757162

and code at http://002.vade.info :)

I think the coordinate + animated offset would look really hot.
}
2009-06-16 15:54:24 = Gynvael Coldwind
{
@vade
Hi, I'm glad you like it ;>
Anyway, the stuff you've got at 002.vade.info is awesome!

Hmm, I thought about animating some things some time ago, and you just reassured me to do it ;>

Take care,
}
2009-07-16 14:22:03 = lallous
{
Thanks for sharing, very nice!!!
}
2017-10-31 04:30:19 = naisanza
{
What if you wanted to produce a 3D (n-gram) block? Would (byte n, byte n+1) still be (x, y), respectively? And how would you handle the Z-axis from the beginning of the binary file, to the end of the file? Would each step just be 16 bytes (the length of the hex alphabet)?
}
2017-11-01 10:48:12 = Gynvael Coldwind
{
@naisanza
If you have "Silence on the wire" book a possible method is described there - it suggest to take (byte n, byte n+1, byte n+2) as (x, y, z) respectively.
}

Add a comment:

Nick:
URL (optional):
Math captcha: 5 ∗ 8 + 4 =