Create a scraping project

The Scraping Robot queue-based API allows users to create scraping projects containing thousands of scraping tasks without worrying about retries or storing data - the system will do everything for them.
The work with the system is based on the so-called Scraping Projects. A Scraping Project is a set of scraping tasks logically united by one name.

Maximum number of tasks per a single project is 10000


The minimum - is only two required parameters:

  • name
  • tasks - an array of objects, representing each separate scraping task

Simplest scraping example:

POST POST https://dashboard.staging.scrapingrobot.com/api-sr/api/projects/create/html?<YOUR_SR_TOKEN>

# Request body:
{"name":"my-test-project","tasks":[{"url":"https://www.google.com/search?q=pizza"},{"url":"https://www.amazon.com/s?k=headphones"}]}

Scraping POST APIs

In order to use the Scraping Robot to send post requests, in addition to the URL, you need to add three more fields to the task: requestType ("post"), contentType (one of the three described below), and postBody.


Currently, three types of POST payloads are supported:

  • application/x-www-form-urlencoded - in this case, the data to the Scraping Robot must be transmitted in the form of key-value pairs, combined with the & symbol. The key and value are separated by the = symbol. The value does not need to be URL-encoded beforehand (SR will do it itself). Example: "eventCounters=[]&jsType=ch&cid=2RR4a";
  • application/json - in this case, the data to the Scraping Robot must be transmitted in the form of a JSON-object. Please note that when the project is JSON-stringified, the fields containing the post-data will be double-stringed. Example: "postBody": "{"call":"createUser","data":{"firstname":"John","lastname":"Doe"}}";
  • text/plain - post data is a simple string and will be posted as is.

Webhooks

SR webhooks are designed to slightly simplify the work of the end user with the system, if the necessary tools are available (a HTTP server that accepts requests from the scraper).
This feature allows the user to be notified that their Scraping Project has been fully processed. In addition, the user can configure webhooks in such a way that the system will automatically send him data as it is processed.
In order to activate webhooks, the user must add an additional field to the scraping project: the webhook field. This field is an object containing two child fields: url and action.


Currently, two types of webhooks are supported:

  • notify - at the end of the processing of all tasks from the project, the user will receive a notification about this event (POST HTTP request);
  • upload - after processing each task from the project, the user will receive the processed data for this task in the form of a POST HTTP-request.

Custom metainformation

You can add meta information to each task you create, as well as to the project globally. It is a text field, up to 250 characters long, that allows the user to add custom labels, identifiers, or other text information that can be used in the future. To add information to the project globally, add a "custom" text field to the root of the JSON. To add information to each individual task, add a "custom" text field to the object corresponding to the task.

Custom request headers

You can add to each task a set of custom HTTP headers, including the User-Agent, that will be used when loading the target site. To do this, add a headers field to each task, which is an object that contains a set of desired headers. Note that when using this feature, the system will not generate its own headers, but will only use custom headers, which may affect the quality of scraping.

Language
Authorization
Query
URL
Click Try It! to start a request and see the response here!