Skip to main content

Extract data from web page

Sophie avatar
Written by Sophie
Updated over 2 weeks ago

Definition and Usage

Extract structured data from web pages such as tables, lists, and other formatted content. This command allows you to scrape data from websites and save it either as a data table variable or to an Excel file for further processing.


Parameter Values

Input parameters

Parameter

Description

Possible Values

Required

Options / Notes

Web page

Select a variable that contains the web page to work with

Yes

Data fields

Define the data fields to extract

Yes

Use "Select data fields" button to specify the data to extract

Pagination

Specify how to handle multi-page data

None, Next page, Load more

Yes

Next page

For multi-page data extraction, specify the pagination button

Only if Pagination = "Next page"

Avoid using specific page numbers

Load more

Specify the 'Load More' button to click for additional data

Only if Pagination = "Load more"

Scope

Define the scope of data extraction

All, Number of rows

Yes

Save data to

Choose where to save the extracted data

Data Table, Excel

Yes

Save location

Select the location to save the extracted data

Only if Save data to = "Excel"

Export with header

Include column headers in the exported data

Checked/Unchecked

No

Specify sheet

Specify a particular worksheet

Checked/Unchecked

Only if Save data to = "Excel"

Sheet name

Specify the worksheet to write to

Only if Specify sheet = Checked

Default is Sheet1

Append

Add data to existing content

Checked/Unchecked

No

Advanced settings

Parameter

Description

Possible Values

Required

Options / Notes

Scroll area

Define the area to scroll for data extraction

Full page, Specific area

Yes

Scroll on

Select the element area you need to scroll

Only if Scroll area = "Specific area"

Scroll type

Determine how the page is scrolled

To the bottom, Screen by screen

Yes

"Screen by screen" is slower but more thorough

Pagination interval (s)

Timeout in seconds for content loading after clicking pagination button

Yes

Default is 1 second

Simulate human pagination click

Simulate mouse actions for pagination clicks

Checked/Unchecked

No

Ensure target elements are visible if checked

Error handling

Parameter Name

Description

Throw error & stop

When an error occurs, the action will trigger an error and stop the execution of the entire app.

Retry command

If an error occurs, the action will retry the command in an attempt to resolve the issue and continue the process.

Ignore error & continue

When an error occurs, the action will be ignored, and the workflow will continue without interruption.

Variables produced

Store data into: Stores the extracted data as a data table variable that can be used in subsequent actions.


Using Variables in Conditions

Variables can be used in this command by clicking on the {x} icon or variable selector where available. For example, you can use variables for the web page input, save location, or pagination interval. When using variables, ensure that the variable type matches the expected input type for the parameter.


Notes

  • Before using this command, ensure you have a valid web page opened or referenced in a variable.

  • For pagination to work properly, the "Next page" or "Load more" button must be consistently located in the same position on each page.

  • When extracting large amounts of data, consider using the "Screen by screen" scroll type to ensure all content is properly loaded.

  • If data extraction is incomplete or inconsistent, try increasing the pagination interval to allow more time for content to load.

  • The "Simulate human pagination click" option may be necessary for websites that detect and block automated interactions.

  • For Excel output, ensure you have proper write permissions to the specified save location.

Did this answer your question?