Path (XML Path Language) is a powerful query language used to navigate and select nodes in XML documents. Whether you're parsing XML files, scraping web data, or writing automated tests, mastering XPath expressions is essential. This tutorial combines categorized expressions with function-focused examples to provide a comprehensive understanding of XPath.
🔍Want to try XPath expressions as you go?
This XPath tester is a handy tool that lets you experiment with XPath queries with our sample XML document.
Sample XML Document
We'll use the following XML document for our examples:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="web">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
Categorized XPath Expressions
XPath expressions can be grouped based on how they select and filter nodes. Understanding these categories helps you write better queries faster.
Selecting Nodes
XPath uses path expressions to select nodes in an XML document. The node is selected by following a path or steps.
Expression | Description | Result |
/bookstore | Selects the root element bookstore | <bookstore>...</bookstore> |
/bookstore/book | Selects all book elements under bookstore | All <book> elements |
//book | Selects all book elements in the document | All <book> elements |
bookstore//book | Selects all book elements under bookstore, at any level | All <book> elements |
//@lang | Selects all attributes named lang | lang="en" from each <title> element |
Predicates
Predicates are used to find a specific node or a node that contains a specific value. Predicates are always embedded in square brackets.
Expression | Description | Result |
/bookstore/book[1] | Selects the first book element under bookstore | First <book> element |
/bookstore/book[last()] | Selects the last book element under bookstore | Last <book> element |
/bookstore/book[position()<3] | Selects the first two book elements under bookstore | First two <book> elements |
//title[@lang] | Selects all title elements with a lang attribute | All <title> elements with lang attribute |
//title[@lang='en'] | Selects all title elements with lang attribute equal to 'en' | All <title lang="en"> elements |
/bookstore/book[price>35.00] | Selects all book elements with price greater than 35.00 | <book> elements with price > 35.00 |
/bookstore/book[price>35.00]/title | Selects the title of books with price greater than 35.00 | <title> elements of books with price > 35.00 |
Selecting Unknown Nodes
XPath wildcards can be used to select unknown XML nodes.
Expression | Description | Result |
/bookstore/* | Selects all child elements of bookstore | All <book> elements |
//* | Selects all elements in the document | All elements in the document |
//title[@*] | Selects all title elements with any attribute | All <title> elements with attributes |
Selecting Several Paths
By using the |
operator in an XPath expression, you can select several paths.
Expression | Description | Result |
//title|//price | Selects all title and price elements in the document | All <title> and <price> elements |
/bookstore/book[1] | /bookstore/book[4] | Selects both the first and fourth <book> elements. | The book "Everyday Italian" |
Commonly Used XPath Functions
1. Selecting Text with text()
If you want to extract the visible text inside an HTML element (for example, the name of a product or the text on a button), the text()
function helps you do just that.
Example:
//title/text()
selects the text of all<title>
elements.
2. Using contains()
to Match Partial Text or Attributes
Sometimes you want to match something even if you only know part of it—like a button that says “Sign in now” or a class name that changes slightly.
That’s where contains()
comes in. It lets you match partial strings.
Example:
//title[contains(text(), 'XML')]
selects titles containing text 'XML'.
3. Using position()
to Choose Elements by Order
What if there are many similar elements, and you only want the first or second one? position()
lets you select elements based on their order in the HTML.
Example:
/bookstore/book[position()=2]
selects the second book.
4. Using last()
to Get the Final Match
Want to select the last item in a list? last()
selects the last node in a node set.
Example:
/bookstore/book[last()]
selects the last book.
5. Using and
, or
, and not
for Logical Conditions
This logical operators for combining or negating conditions let you combine or filter conditions.
and
: Both conditions must be trueor
: Either condition can be truenot()
: The condition must not be true
Example: You can use
//book[price>30 and price<50]
to select books priced between 30 and 50. This XPath returns all<book>
nodes that fall within the specified range:
<book category="web">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="web">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
💡 If you need to retrieve the title of the book, try adding /title/text()
at the end of this XPath.
6. following-sibling::
and preceding-sibling::
These expressions help you navigate between sibling elements on the same level of the HTML tree.
For example, suppose your automation needs to capture the <price>
element that follows a specified <title>
: //title[.='Learning XML']/following-sibling::price
This finds the price that comes after the book titled Learning XML.
To go the other way—finding the <title>
that comes before a given <price>
—you can use: //price[.='39.95']/preceding-sibling::title
This selects the <title>
of the book whose price is 39.95.
XPath Operators
Operators enhance expression power:
Operator | Usage | Example | Result |
| Equals |
| Web books |
| Not equals |
| Non-web books |
| Numeric comparisons |
| XQuery Kick Start |
| Logical |
| 2 books whose prices that fall within the specified range |
XPath Axes
Axes let you move in relation to nodes (parents, children, siblings).
Axis | Description | Example | Result |
| Direct children |
|
|
| Parent of node |
| The |
| Next sibling |
|
|
| Previous sibling |
|
|
XPath Functions Cheat Sheet
Function | Description | Example |
| Selects the text content of a node |
|
| Checks if a string contains a substring |
|
| Checks if a string starts with a substring |
|
| Removes leading and trailing spaces |
|
| Returns the length of a string |
|
| Returns the position of a node |
|
| Returns the last node in a node set |
|
| Negates a condition |
|
To sum up, XPath may look intimidating at first glance, but most tasks require only a handful of expressions and functions. By understanding how paths, predicates, and common functions work, you can start writing more reliable and precise expressions. Practice with real XML or HTML documents, experiment in browser DevTools, and come back to this guide as a quick reference when you're stuck. Happy XPath-ing!