Skip to main content

Understanding XPath: A Simple Introduction for Beginners

Sophie avatar
Written by Sophie
Updated over a week ago

If you’ve ever found that your automation fails to extract the right data from a webpage — or nothing at all — it might be time to get to know XPath.

Don’t worry, XPath isn’t as technical as it sounds. It’s just a way to tell Octoparse AI exactly where your data is on the page — like giving it directions.


What is XPath in simple terms?

Web pages are made of HTML, which works like a tree structure — each part of the page is a node on that tree. XPath is a language that helps you find a specific node, like a block of text, an image, or a button, in that HTML tree.

A Simple Example

Imagine you're looking at this part of a webpage:

<div class="product"> 
<h2 class="title">iPhone 15</h2>
<span class="price">$999</span>
</div>

And you want to extract the product name — iPhone 15.

The XPath might look like this:

//div[@class="product"]/h2[@class="title"]

What this means:

  • //div[@class="product"] → Find a block with class “product”

  • /h2[@class="title"] → Inside that block, find the title

Simple, right? You don’t need to memorize syntax — just understand that XPath describes where the data lives in the page.


Why XPath Matters in Web Automation

When you browse a webpage, your eyes can easily spot what you want. But for automation tools, we need a clear instruction — and that’s where XPath comes in. XPath tells the robot: “Go here, then grab this.”

In Octoparse AI, XPath helps find and interact with web elements. If you want to extract prices, click buttons, or scrape product names — XPath helps the tool know exactly where to look.

You can view XPath using browser tools too. This is how to view XPath in your browser:

  1. Open browser DevTools: - Right-click on a web element (like a name, price, or image), then click Inspect. This will open the DevTools panel.

  2. Select the Element: - Use the Elements tab in DevTools to navigate through the HTML structure. You can hover over elements in the HTML to see them highlighted on the webpage. - Alternatively, you can use the Select an element tool (a mouse pointer icon in the top left of the DevTools panel) to click on the element directly in the webpage.

  3. Copy XPath: - Once you've selected the desired element in the Elements tab, right-click on the highlighted HTML in the DevTools. - Hover over Copy and then select Copy XPath. This will copy the XPath of the selected element to your clipboard.

  4. Use XPath: - You can now paste the copied XPath wherever you need it, such as in testing tools and Octoparse AI.


Do I need to write XPath myself?

Good news: you don’t need to write XPath from scratch. Octoparse AI automatically creates XPath behind the scenes whenever you select a web element — like a title, link, or button.

When you use the + Capture step in the workflow, the system auto-generates an XPath and attaches it to the action. You can open the element editor on the right side of the workflow to view or tweak this XPath.

There’s also a Get Web Element by XPath command, where you can paste your own XPath if you want more control. And for more dynamic cases, Get Relative Element on Web Page helps you locate child, parent, or nearby elements using relative XPath, which we’ll explain in a later tutorial.

But sometimes:

  • The capture tool can’t find the data

  • It extracts the wrong part

  • Or it misses something completely

That’s when you can step in and adjust the XPath to fix it.


Where to see and edit XPath in Octoparse AI?

Whenever you add a step to capture an element on a web page, Octoparse AI will generate and save its XPath. To view or edit it:

  • Go to the Workflow Editor and open the Asset panel on the right.

  • Double-click any captured element to open the Element Editor, where you’ll see the XPath.

  • Or open element editor from command's parameter setting

You can also use the Get Web Element by XPath command when you already know the XPath or want to control it more directly.

For more complex relationships, such as selecting the price that comes after a product title, the Get Relative Element instruction lets you use XPath to define how elements are connected.


Questions you might encounter

Q: If I don’t know how to write XPath, can I still use Octoparse AI? A: Yes! Octoparse AI automatically generates XPath for you in most cases. You can simply click on the data you want — no manual coding needed.

Q: Why didn’t it extract the data I wanted? A: The default XPath might not be accurate enough because some pages have tricky structures. Try tweaking the XPath manually if needed.

Q: It pulled the wrong field — what went wrong? A: There may be multiple similar elements on the page. You can narrow it down by writing a more specific XPath.

Q: What happens if I write the XPath wrong? A: No worries. The worst-case scenario is that no data gets extracted — nothing will crash. You can revise the XPath or go back to the default one.

Q: I got an “Element not found” error — what should I do? A: This could happen if the page hasn't fully loaded yet. Try increasing the wait time or double-check if the XPath still matches the element.

Q: I copied the XPath from the browser — is it the same as what Octoparse AI uses? A: Not exactly. Browsers usually give you an absolute XPath, which might break easily if the page changes. Octoparse AI tries to use more stable and smartly generated paths.


Tips for XPath beginners

  • Start with Octoparse AI ’s automatic capture first

  • Only adjust XPath if something’s missing or incorrect

  • You don’t need to learn it all — just enough to solve your own case

  • Reach out to Octoparse AI support team for help

XPath might sound technical at first, but once you use it a few times, it becomes a powerful tool for fine-tuning your automations.

Did this answer your question?