When working with XPath in web automation, you’ll often hear two terms thrown around: absolute XPath and relative XPath. But what exactly do they mean? And which one should you use?
In this article, we’ll break down the difference, show examples of both, and help you understand which type of XPath is more reliable for your use case — especially when working with Octoparse AI.
What Is an Absolute XPath?
An absolute XPath is a complete path that starts from the root of the HTML document and follows every step through the tree to the target element. It’s like giving someone turn-by-turn directions from the very beginning — no shortcuts allowed.
Here’s an example of an absolute XPath:
//html/body/div[1]/div[2]/div[3]/span
This path says: Start at the <html>
element, go into <body>
, then the first <div>
, then the second, then the third... and finally reach a <span>
.
This kind of XPath depends on the exact structure of the page — every layer must be in the right place for it to work.
What Is a Relative XPath?
A relative XPath starts from somewhere in the middle of the page structure, usually based on a known element or attribute. It lets you jump straight to what you need, using flexible rules.
Here’s what a relative XPath might look like:
//span[@class="price"]
This means: Find any <span>
on the page that has the class price
, no matter where it’s located in the tree.
Relative XPath is much more adaptable — it doesn't care about how many layers deep the element is, just that the conditions match.
Pros and Cons of Each
| Absolute XPath | Relative XPath |
Precision | Very precise — follows exact structure | Flexible — based on attributes or content |
Fragility | Breaks easily if the page layout changes | More resilient to layout changes |
Readability | Often long and hard to read | Shorter and easier to understand |
Use case | Stable internal layouts, like forms | Dynamic or frequently updated pages |
In general, absolute XPath is more likely to break when websites update their structure — and most websites do that often. That’s why relative XPath is usually a better choice for web scraping.
Practical Example: Locating an Image Element
Still a familiar example. Let’s say you’re scraping a product page and want to extract the image:
Absolute XPath:
//html/body/div[2]/div[1]/div[1]/div[4]/div[3]/div[1]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/ul[1]/li[1]/span[1]/span[1]/div[1]
This may work — but only if the structure never changes. If a banner is added to the top of the page or the layout shifts, this XPath might point to the wrong place.
Relative XPath:
//div[@id="imgTagWrapperId"]
This will continue working as long as the class name remains consistent, even if other elements are added or moved around.
If you want to be even more specific, you could write:
//span[@class="a-list-item"]//div[@id="imgTagWrapperId"]
This tells XPath: “Find a div with class imgTagWrapperId
inside a container with class a-list-item
” — giving you more control and reducing the risk of false matches.
Final Thoughts
Choosing between absolute and relative XPath isn’t about right or wrong — it’s about knowing what’s more stable and practical. Absolute XPath can be useful for simple, unchanging structures, but in most real-world scraping scenarios, relative XPath gives you the flexibility and reliability you need.
Now that you understand the difference, let’s take it a step further. In the next part of this series, we’ll explore XPath expressions you’ll actually use — like how to match text, select by index, or combine conditions — so you can write smarter, more powerful XPath in your workflows.
Let’s keep going — XPath gets more powerful the deeper you go.