Skip to main content

2. One-Click Data Extraction

Extract structured data from web pages with a single automated action

Sophie avatar
Written by Sophie
Updated this week

If you’re looking to automate pulling specific data from web pages—such as e-commerce product details or social media posts—this guide will walk you through using the Extraction feature in the workflow editor. The tutorial focuses on batch extraction, with a brief note on per-item extraction as a complementary approach. By the end, you’ll know how to set up a reliable, repeatable process for collecting data from lists and tables.

Getting started with Extraction in the workflow editor

In the workflow editor, you can utilize data extraction capability in two simple ways. First, you can click the Extract Data button on the top command bar. Second, you can use the search box by typing “extract data” to locate and insert the extraction action quickly. This feature is the core instruction for bulk web data collection, designed to pull data from lists, tables, and similar structures efficiently.

Two main extraction modes

There are two primary methods to extract data from web pages: batch extraction and per-item extraction. Batch extraction is your main method for capturing many records from lists or tables in one go. Per-item extraction serves as a supplemental technique, used in combination with other commands to process items one by one when needed, which will be discussed in the following tutorial.

Batch extraction: the main method

This section covers the essential steps and options you’ll configure for batch data collection—the centerpiece of most workflows. Check here for the command documentdation.

Two modes within batch extraction

  1. Smart extraction

  • What it is: Fully automatic recognition and extraction with a single operation.

  • How to use: On the target page, perform a Ctrl+Click to select the data area. This mode works well for standard lists and tables, such as product catalogs or user comment sections.

  • When to choose it: If the page structure is regular and you want a quick setup with minimal manual tuning.

  1. Precise extraction

  • What it is: Manual, precise selection of the fields you actually want to capture, reducing unnecessary data.

  • How to use: In the data preview, click Capture manually, then "Ctrl+Click" to define each column you wish to extract.

  • When to choose it: If Smart Extraction results are not ideal or the page contains complex layouts, conflicting elements, or many optional fields.

Common settings you should know

  • Extraction targets: This is what you’ve defined as the data to capture during the setup steps above. Think of it as the list of fields you want from each item (for example, product name, price, rating, and availability).

  • Pagination options:

    • None: Use when the page uses infinite scrolling (load more as you scroll) or when there is no pagination.

    • Pagination button / Load more: Choose when you need to click a button or trigger a “load more” action to reveal additional items.

  • Scope of extraction:

    • All: Extract every matching item found on the page.

    • Specific rows or pages: Limit extraction to a defined range of items or a set number of pages.

  • Data destination:

    • In-software table: Save extracted data to the built-in table within the tool for further processing.

    • Export to Excel: Save a clean, portable spreadsheet file for sharing or further analysis.

Advanced settings: the key options to fine-tune

  • Scroll region: Some pages require scrolling to reveal all data. You can choose to scroll the entire page or limit scrolling to a specific area within the page. This helps gather data without triggering page layout issues.

  • Pagination delay: To avoid loading gaps or triggering anti-bot protections, introduce a delay between page navigations. A modest wait time often yields more reliable results.

  • Simulated human clicking: Enabling this makes page navigation and clicks resemble real user interactions more closely, reducing the chance that automated actions are blocked by the site.

Practical tips to maximize reliability

  • Start with Smart Extraction for straightforward pages. If results look incomplete or inaccurate, switch to Precise Extraction and manually map the fields.

  • When dealing with infinite scroll, prefer the None option and ensure your automation includes an appropriate scrolling sequence or an explicit trigger to load more data before continuing.

  • Use a reasonable pagination delay to balance speed and reliability. If a site slows down under automation, extend the wait time slightly.

  • Validate a small batch first. Extract a dozen items, review the data, and adjust your field mappings or selection approach before scaling up.

  • Regularly test on a few representative pages from the target site. Web layouts change often, so periodic checks help keep your workflow resilient.

Per-item extraction: a supplementary approach

While batch extraction handles bulk data efficiently, there are scenarios where you need to operate on items individually, perhaps to perform additional actions per item or to handle dynamic content that batch extraction cannot easily capture. In these cases, you can combine per-item extraction with other commands to process items one by one, applying precise transforms or conditional logic as you go.

In summary

The Data Extraction feature in the workflow editor is your primary tool for bulk web data collection. Use Smart Extraction for quick setup on regular pages, and switch to Precise Extraction when you need tight control over the fields. Configure pagination and scroll behavior carefully, and apply delays to keep interactions reliable. With these steps, you’ll be able to automate gathering lists and tables from websites, turning scattered web data into structured, usable information. If you encounter tricky pages, remember to mix batch and per-item approaches for the best results.

Did this answer your question?