The goal of this lab is to find hidden evidence in files using more advanced techniques than Lab 1. Another goal is to become familiar with downloading, installing and using open source forensics utilities for Linux and Windows. The lab will be performed under the Windows virtual machine in Rm 315 (State Farm Lab) and under Linux (extra credit for the Linux portion).
In Windows, you need to become familiar with commands in the Windows command-line shell. Review this Windows command line shell guide in the Resources section above and have the guide available throughout the lab. Many Windows shell commands behave similarly to commands under Unix, although the names and syntax generally differ. Some features are identical. For example, Under Linux and Windows you will often redirect the output of your tools to a file for record-keeping purposes. Both operating systems use '>>' to redirect standard output and append it to a file and '>' to create the file the first time.
As mentioned in the first lecture, binary files have a certain sequence of binary digits (bits) that indicate the start of the binary file and the end of the binary file. These are often called the "file signatures", e.g. the starting and ending sequences can be used to identify the file types. Refer to the File Signatures resource at the top of the lab for more information on file signatures.
Hidden data, also called "dark data", is evidence that cannot be accessed through normal channels by the operating system or application software. Since manual investigation of hard drives is generally no longer feasible in a real investigation, the anti-forensic techniques often used attempt to hide evidence from automated forensic tools such as Encase.
Two such anti-forensics techniques to create dark data involve modifying or exploiting the file signatures on a file. One technique is to change the starting sequence (header bits) so that the operating system cannot tell what the contents of the file are automatically. Another technique is to insert one file past the ending sequence (EOF marker) of another file. Finding the hidden evidence is called file signature analysis. Review the the File Signatures Database for reference.
http://www.cs.csubak.edu/~melissa/cs340/mystery.stuff
This file could normally be displayed in a browser but one hex digit (4 bits) in the file signature has been modified so that the browser cannot distinguish the file type. The file extension has also been changed. Your job is to figure out what the file signature should be, modify it along with the extension so that you can open the file correctly in a viewer.
Open the file in the xvi32 hex editor you downloaded in Lab 1. Compare the hex in the existing header with the file signature table in the File Signatures link in Resources at the top of this lab. It should be fairly easy to see what the signature should be.
There is also some steganography hidden past the EOF marker for this file type. Find that also.
Criminals use file signatures in other ways as well. For example, you can modify the file signature for a GIF89a file (which is '47 49 46 38 39 61') into a Word signature (D0 CF 11 E0 A1 B1 1A E1). Then change the extension and Word will actually open the file but it will look like garbage. To recover the information you just rename the extension and insert the correct file signature for a GIF89a file using a hex editor and then open the file in any image viewer (this might be a good exam problem).
Create a WinHex directory on that shared drive. Click on the self-extracting WinHex.exe executable to extract files. After extraction, click on setup.exe and install into WinHex directory on your shared drive. Do not select the box for "Write protection by default" or the box for "Computer forensics interface." The evaluation copy of WinHex is sufficient for this lab, but should you continue in computer forensics, you will want the forensic features available in the full version.
A file has been inserted into the following image file. (You can do this by appending some bytes past the EOF marker for a jpg file in WinHex and then inserting a file into that area.) You job is to extract that file. Download the following file to your shared drive:
http://www.cs.csubak.edu/~melissa/cs340/lab2.jpg
As you can see, the image looks just like a couple of really cute penguins. There is no indication that another file exists in this file. Open the file in WinHex. Past the EOF marker, you should start to see the other file. Paste that file into another window. Figure out the file type using file signature analysis. Save the file. View it.
We will be using an open source tool called pasco. Pasco is part of the Odessa suite of forensics tools developed by Jones at Foundstone, but you don't need to other tools from Odessa for this lab. Pasco extracts records from index.dat and displays the records in human-readable form on the screen.
Open up a terminal window while logged in to the cs340 account for this part of the lab. Download an open source IE activity analysis tool named pasco using the following commands:
mkdir lab2 cd lab2 wget http://sourceforge.net/projects/odessa/files/Pasco/20040505_1/pasco_20040505_1.tar.gz/download mv download pasco_20040505_1.tar.gz tar -xvzf pasco_20040505_1.tar.gzChange into the pasco directory and follow the instructions in the readme file to compile the pasco executable. Once that is done, copy the binary (or better yet create a symlink) into your lab2 directory using the following command:
cp pasco $HOME/lab2OR
cd $HOME/lab2 ln -s {pasco_dir}/pasco . # replace {pasco_dir} with the correct directoryYou can now use pasco to convert the binary index.dat into human-readable text. Use the following wget command to grab the example history file:
wget http://www.cs.csubak.edu/~melissa/cs340/index.datYou can run pasco with the following command:
pasco index.dat > outfileYour job is find and extract all records from outfile pertaining to Internet activity not involving domain csub or domain nytimes. Sort these extracted records by URL. Now exclude any URLs that appear to be going to advertisment sites such as googleads, addirector, etc. You can do this before or after you sort the file. You can do everything with Unix commands grep, sort and cut. See Unix help for exact syntax. Some commands to get you started (you will need to add additional grep commands to filter out all the ad sites):
grep -vE '(csub|nytimes)' outfile | grep -v 'googleads' > outfile2 sort -u outfile2 > lab2_task3There should be around 50 records left after removing the specified sites.
Some tasks that can be helpful are learning how to list files, run executables, change directories, and move/copy files. You might also want to set up a Windows batch file to point to where you are storing executable files on the shared data directory. To do this on Windows, create a text file in your shared data directory called 'setpath.bat' using any text editor. Inside the file, you can add items to your path. For example, to add the shared data directory to your path, you would do the following:
SET PATH=Z:\CS340 Data Directory\;%PATHYou can add additional directories using a comma-separated list. Once you save the file, you can open a Windows command line shell, change to the directory with the file, and type the following command to run the file:
setpath
File analysis is performed in a forensic investigation to determine the type of file (compressed, text, image, executable, etc.) File analysis is necessary is the file signature has been tampered with. A measure of entropy is used in this process.
In information theory, entropy (H) is a measure of the amount of uncertainty in a random variable. In simple terms, if an event is certain to occur, then that event carries no information. A bit string of all zeros has an entropy of 0. The maximum amount of information is carried when events are completely random. In computer science, entropy is is usually expressed as the average number of bits needed to store or transmit information. An entropy test can determine whether a file is compressed or uncompressed. For a random variable X with n outcomes:
H(X) = lg (n).For a completely random 7-bit ASCII character, H(X) = lg 7 = 2.8. For an ASCII character in English text (which is far from random), H is between 1.0 and 1.5. WinHex/X-Ways file analyzes the entropy in an "unknown" file by computing a byte value distribution see example. From this, W-Ways determines the file type of the file.