2.8 Steganography
The science or art of hiding the very existence of a message is called steganography. Whereas encryption conceals your message by making it unreadable to the outsider, the aim of steganography is to hide the message being communicated. You may have heard of invisible ink or of writing a letter with lemon juice. Those are types of stenography. An early example of it is a secret message, sent from captivity by Herodotus circa 440 BC. He shaved the head of his favourite slave, tattooed the text on his scalp, and waited for the slave's hair to re-grow thus obscuring the message from guards. The same method was used by the German army as recently as in the early 20th century.
With the international legislation regulating complex encryption getting stricter, we are presented with the problem of upholding the right to the privacy of our information by legal means. Steganography does not try to present an outsider with the task of breaking a complex code, but instead aims to bypass his attention altogether. As there are no specific rules defining the exact nature of a steganographic message, it is very difficult to outlaw (for example, subliminal messages are a form of steganography). Some interesting recent developments in the field of linguistic steganography are discussed in this chapter.
There are two main methods of modern steganography. One is data steganography. It relates to hiding a message in an image, a photo, a sound file or 'within other data'. The other is linguistic steganography, i.e. using the language for sending a secret message – by symbols, ambiguous meanings, re-arrangement of letters and other forms of linguistic manipulation. Since linguistic steganography for computer systems is still purely theoretical, our discussion and examples will deal with more traditional message-hiding techniques.
Linguistic Steganography
Linguistic steganography has been gaining attention of late. In itself, it almost constitutes a throwback to computer-assisted hiding and coding techniques, for it relies on the skill in which people are still more proficient than computers – the use and comprehension of language. Comprehension of words, their transformation into meaningful information, detection of humour, symbolism and ambiguities are all still the privileges of the human mind that have no parallels in the computer world. This section will explain some applications of linguistic steganography that may allow you to bypass modern technology-based surveillance systems. Be aware that steganography is not an exact science and advice offered in this chapter, especially on its linguistic application should be applied with caution and testing prior to use in emergency or critical situations.
Our language is a code that appears incomprehensible to anyone who has not learned it. Computers cannot learn languages, and voice recognition software simply operates by detecting different frequencies in our voices and relating them to pre-programmed equivalents of letters. No matter how hard we try teaching computers to understand the meaning of words, such artificial intelligence (AI) remains a distant reality at present. Another language application that lies beyond computers is symbol recognition, applied by humans when reading. Symbol recognition has been used as a method of authentication in (to prove that you are human, not a robot) when registering for differernt services online. The user has to manually input several letters shown to them on-screen. This system, called HIP – Human Interactive Proof, is designed to pre-vent automatic registration of email addresses by computer programs wishing to create email accounts for sending spam. Such programs cannot recognise letters in a picture. The AI-community knows of many other problems a computer cannot easily solve, simply because no one has yet discovered how to build an intuition into its circuits.61
Semagrams
Semagrams are used to hide information through the use of signs and symbols. A visual semagram could relate to an arranged code that is transmitted by waving your hand, placing an item in a specific location on your desk or altering the look of your website. These signs are difficult to detect and have the advantage of normality in an everyday world. Sometimes the effective use of visual semagrams may be your only method of communication with your friends and colleagues, and it is important to establish and pre-arrange some messages that may need to be relayed in times of danger.
Text semagrams are symbolic messages encoded through the medium of text. Capitalised letters, accentuation, peculiar handwriting, blank spaces in-between words can all be used as signals for a pre-defined purpose. Subliminal messages also fall into this category. They are sometimes useful when you wish to communicate a small bit of information. For instance, you could agree with your contacts to exchange seemingly innocuous daily weather reports by email. The phrase 'the sky is grey' may serve as an alert meaning you are in trouble and they should mobilise international help.
Open Codes
Open code steganography hides the message in a legitimate piece of text in the ways not immediately obvious to the observer. Computers and humans have different abilities when it comes to steganalysis, or detecting steganographic messages (see below under 'Detection' sub-heading). The following examples may not be applicable to the surveillance carried out by a human steganalyst. They use linguistic variations of the text to fool the common formulas used by electronic filters and surveillance systems. Please bear in mind that these can only be regarded as hints or suggestions to take advantage of the non-intelligent nature of computer systems (e.g. keyword filtering software). They should not be used to communicate important information, but only to test the effectiveness of the filtering system. If you know that certain words in your email will result in its failure to reach the recipient and this information alone will not get you into trouble, you can try out some of the variations below.
Misspellings
Since electronic filters are programmed to react to certain words, it is impossible to be sure how many variations of the spelling of a word have been considered. It is possible to retain the meaning of the word with some incredibly advanced misspelling! A phrase like ‘human rights’ could be also conveyed as:
hoomaine roites umane reites huumon writes
and many more. Whereas this technique is not practical for longer messages, you can reserve it for certain words that you think may have been included in the filtering systems.62
Phonetics
Most in-country filtering systems are aimed at specific keywords in the local language/s. Sometimes they may also include keywords of a popular second language used in the country or on websites (English, French). Again, one can- not be certain as to how exactly the filtering has been programmed, but for ease of understanding and variety, you can apply the phonetic spelling to your message. This could be particularly useful, if you are accustomed to using a script different from the one used in your country (e.g. Latin script for Arabic speakers or vice versa).
Houkok Al Insan
Jargon
Using jargon in your messages could render its content meaningless to an outside observer. Prearranged meanings or underground terminology can hide the real contents of the message. It is advisable to choose words in such a way that the carrier message remains legible and comprehensible, if not true. The possibilities of the use of jargon are limited only by the stock of the words known to the communicating parties.
Covered Ciphers
Covered ciphers employ a particular method or secret to hide text in an open carrier message. Sometimes these include simple techniques of embedding a message into the words of the carrier. The advantage of using this method is that the carrier message may also appear as some relevant piece of communication and may not arouse suspicion as to any hidden meanings within it.
Consider the following site which masquerades your message to resemble spam. If you have an Internet connection, enter the message 'Please help me' into www.spammimic.com/encode.shtml.
Dear Friend ; You made the right decision when you signed up for our mailing list . This is a one time mailing there is no need to request removal if you won't want any more . This mail is being sent in compliance with Senate bill 2116 ; Title 1 , Section 302 . This is not a get rich scheme ! Why work for somebody else when you can become rich inside 52 WEEKS . Have you ever noticed nobody is getting any younger and more people than ever are surfing the web...
Here, a spam message is mimicked to relay a hidden one within its content. The spam text is derived from a formula of words that is interchangeable depending on your message. It ensures that the spam is still readable and appears 'authentic'.
You can create your own messages that would use a standard format of a typical spam message or other format and agree a specific method of embedding text within it.
Future
The future of linguistic steganography will involve developing software that creates comprehensible text, in which the real message is hidden, using lexicons, ambiguities and word substitution. However, the experts are not yet sure whether computers will be capable of creating meaningful text from scratch and of hiding our messages in it using language semantics and schematics.
Data Steganography
The advent of computers has allowed us to begin embedding messages into pictures or sound files. To the human eye, the picture itself remains unchanged, yet within it there could be up to a book's worth of information.63
Computers, as you may know, operate in binary. That means that every letter and instruction is eventually broken down into a code of '1's and '0's. Let's say that the binary for the letter 'A' is
11101101
Originally, computer architects designed this system in such a way that the very last '1' or '0' had no particular influence on the value of the designated character. If the last number in this message were '0' instead of '1', the computer would still know that this is an 'A'.
11101100
The last digit of all binary messages, which is neither meaningful nor necessary, is known as the Least Significant Bit (LSB). One method, used by data steganography software, is to break up the hidden message between the LSBs of the carrier in a pre-determined pattern. This does not change the original meaning of the message. This method implies that the hidden message cannot be bigger than the carrier and should really be much smaller.
Hiding in Images
Digital images (those that appear on your computer) are broken up into pixels – tiny dots with a specific colour that together make up the image you can see. For images, steganographers encode the message into the pixel LSB. This means that, to the human eye, the colour of the pixel (represented by binary code to the computer) does not change. The hidden message can be withdrawn from the picture provided you know: a) that there is a message in the image b) that you use the same steganographic program for decoding as the one used to hide the message.
The carrier image |
A fragment from the photo, representing different values of individual pixels | The top two rows of the palette have the word ‘OK’ embedded into the LSBs | The resulting steganographic image |
Source: The Code Book, Simon Singh
Note: Steganographic images are detectable. They do not appear any different to the human eye, but computers, when programmed to look for them, can notice the modifications of the LSB. It is for this reason that many security experts doubt the practicality of using steganography. Other methods, like encryption, can also be used to increase the security of information. Some programs will not only code your message into an image, but will encrypt it, too. The steganalysts (those responsible for decoding steganographic messages) would still have to break the encryption in the message extracted from the image.
Hiding in Audio
Steganography can also be applied to audio files. Take, for example, the MP3 format. It is a method of compressing a natural audio file to a much smaller size. This is achieved by removing the audio frequency that the human ear cannot pick up: our ears can only hear sounds of a particular range of frequency. Natural audio, however, records a much larger frequency, and removing the excess sounds does not significantly change the quality of the audio (to our ears). This is how MP3 files are made. Audio steganography adds the message to the unused frequency in them, and – once again – the human ear is unable to detect the difference in the sound quality.
And whereas you may be able to detect the difference by looking at the diagram, it is much more difficult to hear.
Here’s a frequency diagram of an audio transcript
And here is the same piece of audio, with a message hidden within the frequency
Source: Gary C. Kessler – An Overview of Steganography for the Computer Forensics Examiner
Hiding in Text
The steganographic principles can also be applied to a normal text file. Sometimes, this is done by hiding the message in the blank spaces between words. The message is separated between the LSBs of the binary code for the empty space throughout the text. Once again, this method requires the text your are sending to be considerably longer than the message you are hiding within it. You can also hide messages in PDF documents and in a variety of other standards, depending on which program you wish to use.
Steganography software
There exist about a hundred different programs performing data, audio and text steganography. Each one uses it own particular method of arranging your message in the carrier file. Some of the better known are jphide and jpseek (http://linux01.gwdg.de/~alatham/stego.html), mp3stego (http://www.petitcolas.net/fabien/steganography/mp3stego/), as well as the commercial product Steganos Security suite (http://www.steganos.com). You can find many more at http://www.stegoarchive.com/.
Detection
Steganalysis is the process of detecting steganography. Although it is technically easy for computers to detect steganographic content, they must first be configured to look for it. The advantage of using steganography stems from the 'needle in a haystack' principle. Every day millions of images, MP3 files and plain text documents are passed around on the Internet. They do not arouse suspicion and, unlike encrypted messages, are not normally captured for analysis. When sending around photos of your last holiday, you can code a steganographic message into one of the them. Sharing your music collection with a friend presents an opportunity to include a short message in one of the songs. You can imagine the impossibility of scanning huge loads of information transmitted on the Internet for all types of steganographic content.
The 'needle in a haystack' principle only works if there is a 'haystack'. If you have always shared photos of your holiday or your favourite songs with your Internet contact, then the obscurity of your message increases when just another photo or song is sent. Don't use common or out-of-context images. Don't download images from the Internet and hide messages in them (the attacker could download the same image and compare the two digitally). In short, don't reveal your steganographic practices through an anomaly. Establish a pattern of communication, and use it sparingly for transmitting hidden messages. Do not rely on steganography alone to secure your communications. If the hidden message is revealed to the attacker, they should still be prevented from reading its content. Enhance the security of your message by encrypting it within your carrier file.
|