A command line image reader

Rainbox ASCII art

After some semi-serious posts about prime numbers and factorization, now it’s time to something lighter, related with the C ASCII webcam post.

Hacking NASA…

Suppose you are hacking NASA website using HTML. Obviously from the textual browser lynx since you are an hacker and you only use Linux terminal. Even better, in the order, you are hacking NASA:

  • using nano on a file downloaded with wget
  • using vi on a file downloaded with curl (for you mental sanity remember that you just exit with :q! and exit saving with :wq, obviously after enabling the input mode with i)
  • using dd (that should be also your favorite HD partitioning tool) on a file downloaded with an handcrafted c program.

Everything is going fine until you find images proving the presence of aliens in Area 51, but – damn – on CLI you can’t see the images, just the content with something like hexdump -C <filename>:

hexdump example

Unfortunately your visual on-the-fly decoding capability is not as good as operators in Matrix, so you risk being discovered, you need a faster way to see the images.

Data representation

Before proceeding, let’s recall something about image files, starting from the basic. A storage device can contain data; in analogical domain data are usually continuous signals stored in magnetic tapes or others, but in the digital world data are sequences of 0s and 1s, the so called bits or binary digit. The underlying physical layer is still continuous, but it is interpreted as discrete: all signals above a certain threshold are 1, and under the threshold they are considered 0.

Bits are then organized in groups, 8 bits are a byte, a piece of memory that can contain number 28 numbers, from 0 to 255. Four bytes are a word, that can contain 232 number, etc. Bytes have been associated with numbers, letters and symbols, so that binary string can represent also our alphabet. This mapping between bytes and letters is known as character encoding and in history several encodings have been used, among these ASCII, UTF8, UTF16 etc.

Usually computer science starts from this low level, strings of binary digits, but just below this level there is the fantastic domain of electronics and physics: condensers, resistors, circuits that implements logical functions, transistors… We won’t enter this world, but knowing how it works is necessary for a deep understanding of computer science.

File systems and file formats

A file is a binary string in a storage device representing a document, an image, a video or some other content. But how can we find a file in the device? Usually in the first part of a device there is an index of files contained, this index is the file system. Among the first versions of file systems there were simple tables at the beginning of the disk, listing files and they were called Files Allocation Tables, or FAT.

File systems evolved to be more resilient and fast, nowadays there is a special information in the first sector of the disk executed at the boot of the computer, the Master Boot Record (MBR), pointing to the file system(s) contained in the device.

Similarly each file type contains data in a specific way so that programs can read it. Some files have no structure, like txt that are simple text, other file present a specific structure that can be dynamic or static. Imagine an xml file, it is structured but it has tags that differ from file to file, so it has a dynamic structure, while for example some media files have an header with information on resolution, bitrate, etc. followed by data respecting header parameters.

Files with static structure usually have a constant at the beginning of the file (the signature of the file, or magic number, here a list), that helps programs to recognize them, for example BMP images start with BM, old Microsoft binary document files as DOC and XLS started with hexadecimal characters D0 CF 11 E0 (= doc file in hacker script) and so on.

In any case all files have an entry in file system that allows to Operating System to retrieve them when necessary.

File recovery

If the MBR or a file system get compromised (e.g., an hardware failure, an accidental deletion…) there can be data loss, since we miss the information of the position of the file in the device. There are programs specialized in recover damaged data, for example a common technique is searching for a specific patterns typical of the files we are trying to recover. For example if we are searching for JPEG their first bytes are FF D8 FF while they last bytes are FF D9, so we could find all JPG in a drive searching for a regex:

\xFF\xD8\xFF(.*)\xFF\xD9

There are programs that can parse a disk with damaged FAT or sectors, like icat, that can help you recover lost data, but I’ll present this topic in future.

BMP file format

All this chatting only to say that files of a specific format usually have a specific structure and in particular BMP file structure is quite simple. It has 3 main parts: a 54 bits header, possibly a palette and the image data.

The 24-bits BMPs don’t have a palette, just a sequence of RGB values in image data section, while black and white images or scale of colors images has a palette and image data is just the sequence of pixel expressed as indexes of palette.

Coloring string in scripts

It is possible to color background and foreground of echoed strings in scripts, here some useful methods.

#!/bin/bash
color(){
    for c; do
        printf '\e[48;5;%dm%03d' $c $c
    done
    printf '\e[0m \n'
}

IFS=$' \t\n'
color {0..15}
for ((i=0;i<6;i++)); do
    color $(seq $((i*36+16)) $((i*36+51)))
done
color {232..255}
Palette in Linux terminal

Hacking NASA, reloaded

Stop losing time! We just needed a method to quickly decode images in a terminal (but also I wanted to give you all the element to understand my idea)… What about a program that decodes the image and output it in colored ASCII art in a terminal?

Here you find the source code of a C program that does exactly this, only for 24-bit BMP, without needing other libraries or decoders: https://github.com/zagonico86/my-snippets/tree/main/bmp2txt

At this point you decode the image and you had the confirmation of your suspects:

The alien ET decoded by the program
Example of a rainbow rendered by the program

The program is just a starting point and can be greatly improved using existing libraries to load any type of image, moreover it is possible to create a script that uses both background and foreground colors and more characters to render a more detailed ASCII art image (it actually only use a fist of colors and characters), however it is nice since it is self-contained and ready-to-use.

Oh no, FBI is coming, run!! 🙂