This website uses cookies

To provide the highest level of service we use cookies on this site.
Your continued use of the site means that you agree to their use in accordance with our terms and conditions.

Pentest Chronicles

The Hidden Danger in PDFs: How Misconfigurations Can Expose Sensitive Data?

Patryk Bogdan

January 28, 2025

Overview A recent security audit revealed a critical vulnerability in the way WeasyPrint processes user-provided data for generating invoices in PDF format. The issue occurs because of insufficient input validation, allowing attackers to inject malicious HTML tags that are rendered within the generated PDF. This flaw opens the door to extracting sensitive files from the application’s infrastructure or querying remote resources, posing significant security risks. Vulnerability Breakdown The application allows user input, such as names or surnames, to be embedded directly into PDFs via the WeasyPrint engine. However, the lack of proper sanitization permits attackers to inject HTML tags into the input. For instance:

• Tags like ❮b ❯ or ❮h3❯ enable attackers to manipulate the formatting of PDF content.
• Tags like ❮link❯ allow the inclusion of external files as PDF attachments.

Such crafted inputs exploit WeasyPrint's default configuration, enabling unauthorized access to files on the server or external sources. This capability can be leveraged to extract sensitive system data and perform internal reconnaissance. Real-World Exploitation During testing, auditors demonstrated the severity of this vulnerability. Files retrieved included:

• Payment operator tokens
• Credentials for SMS and email gateways
• PostgreSQL database access credentials
• Hosting system access credentials
• JWT encryption keys

Additionally, the vulnerability allowed querying of internal infrastructure from the server processing the PDFs, highlighting its potential for lateral movement within the application environment. Finding Vulnerability Now, let’s move on to the interesting part! To identify this vulnerability and provide proof of its existence, I sent the following: The following response demonstrates the server's acceptance of the malicious payload and initiation of PDF generation: Then I downloaded the PDF file generated by the malicious payload: Below is the extracted Python script (e.g., ex.py) used during the analysis: Now I extract an attachment from a PDF document: Using the ex.py script, the attachment is extracted from the PDF: After running the script, the following file appears in the directory: The contents of the extracted file are displayed, revealing very sensitive environment variables: This is not all! Then I downloaded second PDF file containing another malicious payload: As you can see the contents of the extracted /etc/passwd file are displayed, confirming unauthorized file access: Root Cause This vulnerability stems from the default configuration of WeasyPrint, which allows unrestricted access to local and external files. Without stringent input validation and output sanitization, the software effectively serves as a bridge for unauthorized data extraction.

To address this vulnerability, organizations should: 1. Implement Input Validation and Sanitization
User-generated data should be rigorously sanitized to strip out any HTML or script tags before being incorporated into documents.

2. Restrict Resource Access
Limit the software's access to local and external files, allowing only resources essential for its operation.

3. Environment Hardening
a) Segregate sensitive configuration files across different machines.
b) Adopt the principle of least privilege for processes involved in PDF generation.




Next Pentest Chronicles

When Usernames Become Passwords: A Real-World Case Study of Weak Password Practices

Michał WNękowicz

9 June 2023

In today's world, ensuring the security of our accounts is more crucial than ever. Just as keys protect the doors to our homes, passwords serve as the first line of defense for our data and assets. It's easy to assume that technical individuals, such as developers and IT professionals, always use strong, unique passwords to keep ...

SOCMINT – or rather OSINT of social media

Tomasz Turba

October 15 2022

SOCMINT is the process of gathering and analyzing the information collected from various social networks, channels and communication groups in order to track down an object, gather as much partial data as possible, and potentially to understand its operation. All this in order to analyze the collected information and to achieve that goal by making …

PyScript – or rather Python in your browser + what can be done with it?

michał bentkowski

10 september 2022

PyScript – or rather Python in your browser + what can be done with it? A few days ago, the Anaconda project announced the PyScript framework, which allows Python code to be executed directly in the browser. Additionally, it also covers its integration with HTML and JS code. An execution of the Python code in …

Any questions?

Happy to get a call or email
and help!