PDF (Portable Document Format) files are used on a daily basis both in the working world and by private individuals. This also makes them a popular tool for cybercriminals to use in phishing attacks. They allow direct execution of scripts that can reload additional malware.
The fastest way to determine if a PDF document is malicious is to check its cryptographic hash value at an online service like VirusTotal. If this hash value is still unknown, dynamic analysis using a sandbox can provide additional information about the document. But beware of free sandbox services: If the document is a legitimate internal one, uploading it potentially makes it publicly accessible! For this reason, a static analysis in a virtual machine is a very interesting alternative.
The following process is used to statically analyse a PDF document and is described in more detail in this article:
- Use pdfid.py[1] to get an overview of risky features of the PDF file and assess its properties.
- Use pdf-parser.py[2] to examine specific objects in more detail.
- Extract embedded code, e.g. shellcode, PowerShell, JavaScript, and de-obfuscate if needed (not further explained in this article).
In this article, the two PDF files shown, “View Attached Invoice.pdf”[3] and “Material.pdf”[4], are analysed using this method as examples.
Caution! All of the following activities should be performed in a virtual machine without network access. When analysing malicious PDFs, the surrounding system may be compromised.
Embedded programs are started only after the user’s permission, i.e. when the document is opened, a window with a security warning appears [5] (see Figure 3). This security setting is enabled by default in Adobe and Reader X, 9.3 and 8.2[6]. The user must then decide whether to allow or block access, as shown in the following figure for the second document. If an attacker has sent a malicious PDF file under a credible pretext, there is a high probability that the user will ignore this warning and allow the contained malicious code to be executed.
Analysis With pdfid.py
pdfid.py is a tool by Didier Stevens that searches for specific PDF keywords for initial analysis. In this case, the result of pdfid.py is as follows:
Here, the /Page flag shows that the document is one page long, which is common for malicious PDFs. Otherwise, there is nothing noticeable at first glance, as the other flags have a null value. Accordingly, the document does not contain any JavaScript (/JavaScript and /JS) and no embedded interactive forms (/AcroForm and /XFA). It does not launch any external or embedded programs (/Launch and /EmbeddedFiles), nor does it open automatically (/AA and /OpenAction).
Despite no clear signs of a malicious document, it is recommended to continue the investigation with another tool such as pdf-parser.py. pdfid.py is not comprehensive and does not indicate, for example, whether the document refers to links that could be malicious. Therefore, the content of the document should be analysed in more detail.
Analysis With pdf-parser.py
pdf-parser.py is also a tool developed by Didier Stevens. It parses the various elements of the PDF file and displays their content.
This tool also offers the possibility to search for keywords using the “-s” parameter. In this example, searching for “URI” returns the following result:
This URI (see red box in Figure 5) is a link as an annotation (see “Type: /Annot”) in object 12.
A search[7] on urlscan.io[8] shows that this is a phishing page targeting Adobe. From the screenshot below, it can be seen that it asks for the user’s email credentials in order to read the document.
So the goal of the PDF document is to get the credentials for the Adobe Cloud account of the concerned user.
Analysis With peepdf.py
peepdf.py[9] can be used as an alternative to pdfid.py and pdf-parser.py. It provides an interactive shell that can be used to navigate through the structure of the PDF file and search its content. The tool also indicates when it finds suspicious elements.
For Material.pdf, the result is as follows:
peepdf.py found three suspicious objects, one in version 0 and two in version 2 of the document. /AcroForm (object 7) and /Launch (objects 173 and 65) can be used to embed interactive forms and launch external or embedded programs, respectively.
Object 173 is of particular interest here because it points to a URL:
This URL was also visible in the warning message (see Figure 3).
A search on urlscan.io[10] shows that this URL is used to download the file “hgfetb.R11”. In addition, VirusTotal[11] indicates that this file has been classified as malicious by several security products.
Thus, the aim of this PDF file is to make the victim download malicious software.
Conclusion
As a useful alternative to sandboxes, a static analysis can quickly determine whether a PDF document is malicious and what its characteristics are. Since such documents are designed to use social engineering to trick users into allowing the embedded programs to run, it is recommended that employees be trained accordingly – as they are the first line of defense against such attacks.
Do you have any questions or need assistance with implementation? The Digital Forensics and Incident Response Team is happy to help. We look forward to hearing from you!