How to paginate with ElasticSearch using Scroll in Rivery

Source: ElasticSearch Scroll documentation [here]

1. Create two REST APIs

Rest API Set-Up: Rest API Source Walkthrough - Rest API Source

Call 1: Fetch first _scroll_id

POST https://{endpoint_base_url}/prod/{index}/_search?scroll=5m
{
  "size": {page_size},
  "query": {
    "match_all": {}
  }
}

Call 2: Populate _scroll_id in this call and paginate over it
Notice that you do not put the index in the url here.

POST https://{endpoint_base_url}/_search/scroll
{
    "scroll" : "1m",                                                                 
    "scroll_id" : "{_scroll_id_from_body_output_call1}"

}

We will patch these together in the next step and paginate over the second call.

2. Create a multi-action with pagination:

In step 1: This Multi-Action flow returns the START REST API output and puts the _scroll_id into a variable {_scroll_id} .

In step 2: The β€œ{_scroll_id}β€œ variable gets populated in the body of the POST request.

In the return set-up you set up pagination in which you take the next page key _scroll_id and populate it in the body parameter β€œ scroll_id β€œ

Make sure to associate the _scroll_id from step 1 to new variable to new variable { scroll_id_page }
used in step 2 request body.

3. Created a Source to Target River:
C’est tout! Now you can created a Source to Target River that runs the MultiAction above and runs it into your desired database.

2 Likes