DFIR, Simple: Analysis of PDF Files

Blog

Informative, up-to-date and exciting – the Oneconsult Cybersecurity Blog.

Home | Blog | Incident Response | DFIR, Simple: Analysis of PDF Files

DFIR, Simple: Analysis of PDF Files

Nadia Meichtry

06.07.2022

(updated on: 09.09.2024)

PDF (Portable Document Format) files are used on a daily basis both in the working world and by private individuals. This also makes them a popular tool for cybercriminals to use in phishing attacks. They allow direct execution of scripts that can reload additional malware.

The fastest way to determine if a PDF document is malicious is to check its cryptographic hash value at an online service like VirusTotal. If this hash value is still unknown, dynamic analysis using a sandbox can provide additional information about the document. But beware of free sandbox services: If the document is a legitimate internal one, uploading it potentially makes it publicly accessible! For this reason, a static analysis in a virtual machine is a very interesting alternative.

The following process is used to statically analyse a PDF document and is described in more detail in this article:

Use pdfid.py[1] to get an overview of risky features of the PDF file and assess its properties.
Use pdf-parser.py[2] to examine specific objects in more detail.
Extract embedded code, e.g. shellcode, PowerShell, JavaScript, and de-obfuscate if needed (not further explained in this article).

In this article, the two PDF files shown, “View Attached Invoice.pdf”[3] and “Material.pdf”[4], are analysed using this method as examples.

Caution! All of the following activities should be performed in a virtual machine without network access. When analysing malicious PDFs, the surrounding system may be compromised.

*Figure 1: “View Attached Invoice.pdf” in a PDF reader*

*Figure 2: “Material.pdf” in a PDF reader*

Embedded programs are started only after the user’s permission, i.e. when the document is opened, a window with a security warning appears [5] (see Figure 3). This security setting is enabled by default in Adobe and Reader X, 9.3 and 8.2[6]. The user must then decide whether to allow or block access, as shown in the following figure for the second document. If an attacker has sent a malicious PDF file under a credible pretext, there is a high probability that the user will ignore this warning and allow the contained malicious code to be executed.

*Figure 3: Security warning when opening the “Material.pdf” file*

Analysis With pdfid.py

pdfid.py is a tool by Didier Stevens that searches for specific PDF keywords for initial analysis. In this case, the result of pdfid.py is as follows:

*Figure 4: Result of pdfid.py for “View Attached Invoice.pdf”*

Here, the /Page flag shows that the document is one page long, which is common for malicious PDFs. Otherwise, there is nothing noticeable at first glance, as the other flags have a null value. Accordingly, the document does not contain any JavaScript (/JavaScript and /JS) and no embedded interactive forms (/AcroForm and /XFA). It does not launch any external or embedded programs (/Launch and /EmbeddedFiles), nor does it open automatically (/AA and /OpenAction).

Despite no clear signs of a malicious document, it is recommended to continue the investigation with another tool such as pdf-parser.py. pdfid.py is not comprehensive and does not indicate, for example, whether the document refers to links that could be malicious. Therefore, the content of the document should be analysed in more detail.

Analysis With pdf-parser.py

pdf-parser.py is also a tool developed by Didier Stevens. It parses the various elements of the PDF file and displays their content.

This tool also offers the possibility to search for keywords using the “-s” parameter. In this example, searching for “URI” returns the following result:

*Figure 5: Result of searching for URI with pdf-parser.py in “View Attached Invoice.pdf”*

This URI (see red box in Figure 5) is a link as an annotation (see “Type: /Annot”) in object 12.

A search[7] on urlscan.io[8] shows that this is a phishing page targeting Adobe. From the screenshot below, it can be seen that it asks for the user’s email credentials in order to read the document.

*Figure 6: Result of search on urlscan.io*

So the goal of the PDF document is to get the credentials for the Adobe Cloud account of the concerned user.

Analysis With peepdf.py

peepdf.py[9] can be used as an alternative to pdfid.py and pdf-parser.py. It provides an interactive shell that can be used to navigate through the structure of the PDF file and search its content. The tool also indicates when it finds suspicious elements.

For Material.pdf, the result is as follows:

*Figure 7: Result of peepdf.py for “Material.pdf”*

peepdf.py found three suspicious objects, one in version 0 and two in version 2 of the document. /AcroForm (object 7) and /Launch (objects 173 and 65) can be used to embed interactive forms and launch external or embedded programs, respectively.

Object 173 is of particular interest here because it points to a URL:

*Figure 8: Analysis of object 173 with peepdf.py*

This URL was also visible in the warning message (see Figure 3).

A search on urlscan.io[10] shows that this URL is used to download the file “hgfetb.R11”. In addition, VirusTotal[11] indicates that this file has been classified as malicious by several security products.

*Figure 9: Result of scan of file “hgfetb.R11” with VirusTotal*

Thus, the aim of this PDF file is to make the victim download malicious software.

Conclusion

As a useful alternative to sandboxes, a static analysis can quickly determine whether a PDF document is malicious and what its characteristics are. Since such documents are designed to use social engineering to trick users into allowing the embedded programs to run, it is recommended that employees be trained accordingly – as they are the first line of defense against such attacks.

Do you have any questions or need assistance with implementation? The Digital Forensics and Incident Response Team is happy to help. We look forward to hearing from you!

Contact

Author

Nadia Meichtry studied forensic science at the University of Lausanne and completed her master’s degree in digital forensics in summer 2020. She is a certified GIAC Certified Forensic Analyst (GCFA) and GIAC Reverse Engineering Malware (GREM), and certified OSSTMM Professional Security Tester (OPST) and joined Oneconsult in August 2020 as a Digital Forensics & Incident Response Specialist.

DFIR, Simple: Analysis of PDF Files

Analysis With pdfid.py

Analysis With pdf-parser.py

Analysis With peepdf.py

Conclusion

Author

Your security is our top priority – our specialists provide you with professional support.