Definition and Usage
Extract structured data from web pages such as tables, lists, and other formatted content. This command allows you to scrape data from websites and save it either as a data table variable or to an Excel file for further processing.
Parameter Values
Input parameters
Parameter | Description | Possible Values | Required | Options / Notes |
Web page | Select a variable that contains the web page to work with |
| Yes |
|
Data fields | Define the data fields to extract |
| Yes | Use "Select data fields" button to specify the data to extract |
Pagination | Specify how to handle multi-page data | None, Next page, Load more | Yes |
|
Next page | For multi-page data extraction, specify the pagination button |
| Only if Pagination = "Next page" | Avoid using specific page numbers |
Load more | Specify the 'Load More' button to click for additional data |
| Only if Pagination = "Load more" |
|
Scope | Define the scope of data extraction | All, Number of rows | Yes |
|
Save data to | Choose where to save the extracted data | Data Table, Excel | Yes |
|
Save location | Select the location to save the extracted data |
| Only if Save data to = "Excel" |
|
Export with header | Include column headers in the exported data | Checked/Unchecked | No |
|
Specify sheet | Specify a particular worksheet | Checked/Unchecked | Only if Save data to = "Excel" |
|
Sheet name | Specify the worksheet to write to |
| Only if Specify sheet = Checked | Default is Sheet1 |
Append | Add data to existing content | Checked/Unchecked | No |
|
Advanced settings
Parameter | Description | Possible Values | Required | Options / Notes |
Scroll area | Define the area to scroll for data extraction | Full page, Specific area | Yes |
|
Scroll on | Select the element area you need to scroll |
| Only if Scroll area = "Specific area" |
|
Scroll type | Determine how the page is scrolled | To the bottom, Screen by screen | Yes | "Screen by screen" is slower but more thorough |
Pagination interval (s) | Timeout in seconds for content loading after clicking pagination button |
| Yes | Default is 1 second |
Simulate human pagination click | Simulate mouse actions for pagination clicks | Checked/Unchecked | No | Ensure target elements are visible if checked |
Error handling
Parameter Name | Description |
Throw error & stop | When an error occurs, the action will trigger an error and stop the execution of the entire app. |
Retry command | If an error occurs, the action will retry the command in an attempt to resolve the issue and continue the process. |
Ignore error & continue | When an error occurs, the action will be ignored, and the workflow will continue without interruption. |
Variables produced
Store data into: Stores the extracted data as a data table variable that can be used in subsequent actions.
Using Variables in Conditions
Variables can be used in this command by clicking on the {x} icon or variable selector where available. For example, you can use variables for the web page input, save location, or pagination interval. When using variables, ensure that the variable type matches the expected input type for the parameter.
Notes
Before using this command, ensure you have a valid web page opened or referenced in a variable.
For pagination to work properly, the "Next page" or "Load more" button must be consistently located in the same position on each page.
When extracting large amounts of data, consider using the "Screen by screen" scroll type to ensure all content is properly loaded.
If data extraction is incomplete or inconsistent, try increasing the pagination interval to allow more time for content to load.
The "Simulate human pagination click" option may be necessary for websites that detect and block automated interactions.
For Excel output, ensure you have proper write permissions to the specified save location.