arrow-right

Back

Declarative Pagination System in Truto Unified Real-time API

by

The Truto Team

Posted

Jun 28, 2024

This blog should serve as a guide to anyone who is looking to build a generic pagination system for their API integrations layer. It highlights how Truto has built its generic pagination system for it's real-time Unified API to handle various pagination formats without writing integration-specific code. And most importantly, it goes over the various edge cases and quirks it had to handle over the course of looking at and building over 250+ APIs and integrations.

Real-time Unified API in this context means that Truto is able to fetch data from various APIs, normalize it into a common format, and serve it to the end-user in real-time. This is done without storing any of the end-user data.

If you are caching the data from the API responses with you, either in a database or any kind of datastore, you might not need to read this blog because all of the cached data can be paginated in whatever format you think is best for your use case.

The blog will cover the following topics:

  • Introduction about Truto

  • What is pagination?

  • Different formats of pagination

  • How Truto normalizes the pagination formats

  • Quirks and edge cases

This guide is intended for developers with prior experience in interacting with APIs and building API integrations.

Introduction about Truto

Truto is a Unified API platform that allows you to build native HTTP based API integrations within your application. It handles all the grunt work related to API integrations like authentication, pagination, and error handling. All of this in real-time without storing any of your end-user data.

What is pagination?

If you have ever worked with REST or GraphQL APIs, you might have come across pagination. Pagination is a technique used to break down a large set of data into smaller chunks. For example, if you have a list of 1000 contacts in a CRM, you might not want to or cannot fetch all of them at once in certain scenarios. Pagination allows you to fetch a subset of the data at a time, like 20 or 50 contacts at a time.

Different formats of pagination

Some of the most common pagination formats are:

1. Offset-based pagination: This format uses an offset and a limit to fetch the data. For example, to fetch the first 20 contacts, you would use an offset of 0 and a limit of 20. To fetch the next 20 contacts, you would use an offset of 20 and a limit of 20. The most common starting offset is 0, but some APIs might use 1 as the starting offset. Most APIs out there use query parameters to specify the offset and limit. For example:

GET /contacts?offset=0&limit=20 GET /contacts?offset=20&limit=20

Some of them might also specify it in the request body.

2. Page-based pagination: This format uses a page number and a page size to fetch the data. For example, to fetch the first page of 20 contacts, you would use page number 1 and a page size of 20. To fetch the next page of 20 contacts, you would use page number 2 and a page size of 20. The most common starting page number is 1, but some APIs might use 0 as the starting page number.

GET /contacts?page=1&pageSize=20 GET /contacts?page=2&pageSize=20

3. Cursor-based pagination: This format uses a cursor to fetch the data. The cursor is a unique identifier that points to a specific record in the dataset. To fetch the next set of data, you would use the cursor returned in the previous response.

The cursor can be opaque or transparent.

A transparent cursor is a direct reference to the record in the dataset, for example, the primary key of a record. Stripe uses transparent cursors in their APIs, where the cursor is the ID of the record and you would pass the ID of the last record fetched to get the next set of records.

An opaque cursor may or may not be a direct reference to the record in the dataset. It is usually a token that encodes the state of the query. You would pass this token to get the next set of records.

Cursor-based paginations also accept a limit to specify the number of records to fetch.

Most of the APIs use query parameters to specify the cursor and limit. For example:

GET /contacts?cursor=abc123&limit=20

The cursor can be found from the response of the previous request. It can be in the response body or in the response headers.

{ 
  "data": [ 
    { "id": 1, "name": "John Doe" } 
  ], 
    "cursor": "abc123" 
}

4. Link header pagination: This format uses the Link header to specify the next and previous URLs to fetch the data. The Link header contains the URL to the next and previous pages.

They usually have two query parameters, `page` and `pageSize` and can be thought of a special case of page-based pagination which has been standardized by RFC 5988.

Github uses this format in their APIs.

The link header looks like this:

Link: <https://api.example.com/contacts?page=2&pageSize=20>; rel="next", <https://api.example.com/contacts?page=1&pageSize=20>; rel="prev"

Here `rel="next"` specifies the URL to the next page and `rel="prev"` specifies the URL to the previous page.

How Truto normalizes the pagination formats

Simple answer: Cursor-based pagination. Truto uses cursor-based pagination to normalize the pagination formats.

Cursor-based pagination is a format which can be used to represent all the other formats of pagination mentioned above. We take the underlying pagination data and base64 encode it to create an opaque cursor. Opaque here means that the cursor is not a direct reference to the record in the dataset, but it encodes the state of the query. You can still base64 decode the cursor to get the underlying pagination data.

Truto's pagination format

Truto's pagination format contains two query parameters: `next_cursor` and `limit`. The `next_cursor` is the cursor to fetch the next set of data and the `limit` is the number of records to fetch in a single API call.

An example request to fetch the first 20 contacts from a CRM using the Unified API would look like this:

GET /api/unified/contacts?limit=20

The response from our Unified API would look like this:

{ 
  "results": [
    { "id": 1, "name": "John Doe" 
    }, 
    { "id": 2, "name": "Jane Doe" 
    }, 
    ... 
  ], 
  "next_cursor": "b2Zmc2V0PTIwJmxpbWl0PTIw" 
}

And the next request to fetch the next set of contacts would look like this:

GET /api/unified/contacts?next_cursor=b2Zmc2V0PTIwJmxpbWl0PTIw&limit=20

The next_cursor contains an opaque cursor which the client can use to fetch the next set of data.

Once the client has fetched all the data, the next_cursor will be null.

How the cursor is generated for various pagination formats

We'll now look at how the `next_cursor` is generated for various pagination formats.

Offset-based pagination

If you have an offset-based pagination which accepts an `offset` and a `page_size` query parameter, you can base64 encode the offset and limit to create a cursor.

Imagine the customer has 1000 contacts and you are fetching 20 contacts at a time. The first request to our Unified API would look like this:

GET /api/unified/contacts?limit=20

We map the `limit` query parameter of the Unified API to the `page_size` query parameter of the CRM API.

The response from our Unified API would look like this:

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "b2Zmc2V0PTIwJnBhZ2Vfc2l6ZT0yMA"
}

The cursor is base64 encoded and contains the following string: `offset=20&page_size=20`.

As you can see, the cursor contains the offset and limit which can be used to fetch the next set of data.

The next request to fetch the next set of contacts would look like this:

GET /contacts?next_cursor=b2Zmc2V0PTIwJnBhZ2Vfc2l6ZT0yMA&limit=20

Since the `next_cursor` query parameter is now provided, Truto decodes the cursor to get the `offset` and `page_size` query parameters and passes them to the CRM API to fetch the next set of data.

Here `limit` query parameter of the Unified API is ignored.

The response from the CRM API would look like this:

{
  "data": [
    {
      "id": 21,
      "name": "John Doe"
    },
    {
      "id": 22,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "b2Zmc2V0PTQwJnBhZ2Vfc2l6ZT0yMA"
}

Knowing when to stop

Truto checks if the number of records returned by the CRM API is less than the limit specified by the client. If the number of records is less than the limit, it means that there are no more records to fetch and the next_cursor will be null.

Page-based pagination

If you have a page-based pagination which accepts a `page` and a `pageSize` query parameter, you can base64 encode the page and pageSize to create a cursor.

Imagine the customer has 1000 contacts and you are fetching 20 contacts at a time. The first request to our Unified API would look like this:

GET /api/unified/contacts?limit=20

We map the `limit` query parameter of the Unified API to the `pageSize` query parameter of the CRM API.

The response from our Unified API would look like this:

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "cGFnZT0yJnBhZ2VTaXplPTIw"
}

The cursor is base64 encoded and contains the following string: `page=2&pageSize=20`. Here we assume the API is 1-indexed.

As you can see, the cursor contains the page and pageSize which can be used to fetch the next set of data.

The next request to fetch the next set of contacts would look like this:

GET /contacts?next_cursor=cGFnZT0yJnBhZ2VTaXplPTIw&limit=20

Since the `next_cursor` query parameter is now provided, Truto decodes the cursor to get the `page` and `pageSize` query parameters and passes them to the CRM API to fetch the next set of data.

Here `limit` query parameter of the Unified API is ignored.

The response from the CRM API would look like this:

{
  "data": [
    {
      "id": 21,
      "name": "John Doe"
    },
    {
      "id": 22,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "cGFnZT0zJnBhZ2VTaXplPTIw"
}

Knowing when to stop

Truto checks if the number of records returned by the CRM API is less than the pageSize specified by the client. If the number of records is less than the pageSize, it means that there are no more records to fetch and the next_cursor will be null.

Link header pagination

The link header pagination format is easy to handle because underlying APIs directly return the exact URL to fetch the next set of data.

So the approach here is to first parse the header from the response and then extract the URL with `rel="next"` and then just use the query parameters in that as the value of the `next_cursor`.

Imagine the customer has 1000 contacts and you are fetching 20 contacts at a time. The first request to our Unified API would look like this:

GET /api/unified/contacts?limit=20&foo=bar

We map the `limit` query parameter of the Unified API to the `pageSize` query parameter of the CRM API. Sometimes, even though the APIs use link header pagination, the query parameter name might be different, so Truto makes it configurable, but by default, it uses `pageSize`.

We are also passing an extra query parameter `foo=bar` to the CRM API, which is not used in the pagination.

The response from our Unified API would look like this:

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "cGFnZT0yJnBhZ2VTaXplPTIwJmZvbz1iYXI"
}

The cursor is base64 encoded and contains the following string: `page=2&pageSize=20&foo=bar`.

As you can see, the cursor contains the page, pageSize, and the extra query parameter `foo`. The cursor is calculated from an link header like this:

Link: <https://api.crm.com/contacts?page=2&pageSize=20&foo=bar>; rel="next"

The next request to fetch the next set of contacts would look like this:

GET /contacts?next_cursor=cGFnZT0yJnBhZ2VTaXplPTIwJmZvbz1iYXI&limit=20

The `limit` query parameter of the Unified API is ignored.

Knowing when to stop

Truto checks if the link header contains a URL with `rel="next"`. If it does not contain a URL with `rel="next"`, it means that there are no more records to fetch and the next_cursor will be null.

Cursor-based pagination

Cursor-based pagination is easy to handle because underlying APIs directly return the cursor to fetch the next set of data.

As mentioned earlier, the cursor can be transparent or opaque, and it can be either found a field in the response body or needs to be calculated from the response body (Stripe, where the next_cursor is the ID of the last record in the response).

So Truto provides a JSON path to extract the cursor from the response.

For example, if the raw API response looks something like,

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "meta": {
    "pagination": {
      "next": "foobar"
    }
  }
}

Then the JSON path to extract the cursor would be `meta.pagination.next`.

In case of Stripe, the cursor is the ID of the last record in the response, so the JSON path would be `data[-1].id`. `-1` is used to get the last element of the array.

We just use the value found at the JSON path as the value of the `next_cursor`.

Knowing when to stop

Truto checks if the cursor (value found at the JSON path) is null. If the cursor is null, it means that there are no more records to fetch and the next_cursor will be null.

Some APIs also return a `has_more` or equivalent field which is typically a boolean, in the response body to indicate if there are more records to fetch. Truto also checks this field to know when to stop.

When APIs return the next link in the response body

In some cases, the APIs directly return the link to the next set of data in the response body regardless of the pagination format. So Truto considers them as cursor-based pagination and extracts the URL query parameters from the next link in the response body and encodes them into the `next_cursor`.

For example, if the raw API response looks something like,

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next": "https://api.crm.com/contacts?page=2&pageSize=20"
}

Truto extracts the query parameters from the `next` field and encodes them into the `next_cursor`.

Quirks and edge cases

We will now list a few quirks and edge cases Truto handles for the various pagination formats found in the wild —

  • The API path for fetching the subsequent pages might be different from the initial request. There are two famous products which have slight variations of this

  • Dropbox: Dropbox provides one URL for the initial request and a different URL for fetching the subsequent pages, and and and, you can't use any other query parameters in the subsequent requests. Truto solves this by making the pagination URL configurable (pagination_path) and adding a flag to ignore the query parameters in the subsequent requests (ignore_limit_in_pagination).

  • Salesforce: In thier SOQL Query API, Salesforce returns the URL to the next set of records, but here it's not enough to just encode the query parameters, the path of the URL also needs to be considered. Truto solves this by adding a flag to include the path of the URL as well (include_path).

  • Starting offset and page number might be 0 or 1. Truto handles this by making the starting offset and page configurable.

  • Enforcing maximum limit. Truto enforces a maximum limit on the number of records that can be fetched in a single request. This is configurable based on the maximum limit of the underlying API.

  • Some APIs might always require the limit or other query parameters to be present in the request. Truto handles this by always including the default values of the query parameters in the request.

  • Not all APIs accept the pagination data in the query parameters. Some APIs accept the pagination data in the request body. Truto handles this by decoupling the pagination parsing and encoding from the request building.

  • Some cursor based APIs never return a null cursor. So Truto checks if the number of records fetched in the response is less than the limit and stops fetching the records. It also employs a SHA-1 has of the request body to check if the same records are being fetched again.

This blog should serve as a guide to anyone who is looking to build a generic pagination system for their API integrations layer. It highlights how Truto has built its generic pagination system for it's real-time Unified API to handle various pagination formats without writing integration-specific code. And most importantly, it goes over the various edge cases and quirks it had to handle over the course of looking at and building over 250+ APIs and integrations.

Real-time Unified API in this context means that Truto is able to fetch data from various APIs, normalize it into a common format, and serve it to the end-user in real-time. This is done without storing any of the end-user data.

If you are caching the data from the API responses with you, either in a database or any kind of datastore, you might not need to read this blog because all of the cached data can be paginated in whatever format you think is best for your use case.

The blog will cover the following topics:

  • Introduction about Truto

  • What is pagination?

  • Different formats of pagination

  • How Truto normalizes the pagination formats

  • Quirks and edge cases

This guide is intended for developers with prior experience in interacting with APIs and building API integrations.

Introduction about Truto

Truto is a Unified API platform that allows you to build native HTTP based API integrations within your application. It handles all the grunt work related to API integrations like authentication, pagination, and error handling. All of this in real-time without storing any of your end-user data.

What is pagination?

If you have ever worked with REST or GraphQL APIs, you might have come across pagination. Pagination is a technique used to break down a large set of data into smaller chunks. For example, if you have a list of 1000 contacts in a CRM, you might not want to or cannot fetch all of them at once in certain scenarios. Pagination allows you to fetch a subset of the data at a time, like 20 or 50 contacts at a time.

Different formats of pagination

Some of the most common pagination formats are:

1. Offset-based pagination: This format uses an offset and a limit to fetch the data. For example, to fetch the first 20 contacts, you would use an offset of 0 and a limit of 20. To fetch the next 20 contacts, you would use an offset of 20 and a limit of 20. The most common starting offset is 0, but some APIs might use 1 as the starting offset. Most APIs out there use query parameters to specify the offset and limit. For example:

GET /contacts?offset=0&limit=20 GET /contacts?offset=20&limit=20

Some of them might also specify it in the request body.

2. Page-based pagination: This format uses a page number and a page size to fetch the data. For example, to fetch the first page of 20 contacts, you would use page number 1 and a page size of 20. To fetch the next page of 20 contacts, you would use page number 2 and a page size of 20. The most common starting page number is 1, but some APIs might use 0 as the starting page number.

GET /contacts?page=1&pageSize=20 GET /contacts?page=2&pageSize=20

3. Cursor-based pagination: This format uses a cursor to fetch the data. The cursor is a unique identifier that points to a specific record in the dataset. To fetch the next set of data, you would use the cursor returned in the previous response.

The cursor can be opaque or transparent.

A transparent cursor is a direct reference to the record in the dataset, for example, the primary key of a record. Stripe uses transparent cursors in their APIs, where the cursor is the ID of the record and you would pass the ID of the last record fetched to get the next set of records.

An opaque cursor may or may not be a direct reference to the record in the dataset. It is usually a token that encodes the state of the query. You would pass this token to get the next set of records.

Cursor-based paginations also accept a limit to specify the number of records to fetch.

Most of the APIs use query parameters to specify the cursor and limit. For example:

GET /contacts?cursor=abc123&limit=20

The cursor can be found from the response of the previous request. It can be in the response body or in the response headers.

{ 
  "data": [ 
    { "id": 1, "name": "John Doe" } 
  ], 
    "cursor": "abc123" 
}

4. Link header pagination: This format uses the Link header to specify the next and previous URLs to fetch the data. The Link header contains the URL to the next and previous pages.

They usually have two query parameters, `page` and `pageSize` and can be thought of a special case of page-based pagination which has been standardized by RFC 5988.

Github uses this format in their APIs.

The link header looks like this:

Link: <https://api.example.com/contacts?page=2&pageSize=20>; rel="next", <https://api.example.com/contacts?page=1&pageSize=20>; rel="prev"

Here `rel="next"` specifies the URL to the next page and `rel="prev"` specifies the URL to the previous page.

How Truto normalizes the pagination formats

Simple answer: Cursor-based pagination. Truto uses cursor-based pagination to normalize the pagination formats.

Cursor-based pagination is a format which can be used to represent all the other formats of pagination mentioned above. We take the underlying pagination data and base64 encode it to create an opaque cursor. Opaque here means that the cursor is not a direct reference to the record in the dataset, but it encodes the state of the query. You can still base64 decode the cursor to get the underlying pagination data.

Truto's pagination format

Truto's pagination format contains two query parameters: `next_cursor` and `limit`. The `next_cursor` is the cursor to fetch the next set of data and the `limit` is the number of records to fetch in a single API call.

An example request to fetch the first 20 contacts from a CRM using the Unified API would look like this:

GET /api/unified/contacts?limit=20

The response from our Unified API would look like this:

{ 
  "results": [
    { "id": 1, "name": "John Doe" 
    }, 
    { "id": 2, "name": "Jane Doe" 
    }, 
    ... 
  ], 
  "next_cursor": "b2Zmc2V0PTIwJmxpbWl0PTIw" 
}

And the next request to fetch the next set of contacts would look like this:

GET /api/unified/contacts?next_cursor=b2Zmc2V0PTIwJmxpbWl0PTIw&limit=20

The next_cursor contains an opaque cursor which the client can use to fetch the next set of data.

Once the client has fetched all the data, the next_cursor will be null.

How the cursor is generated for various pagination formats

We'll now look at how the `next_cursor` is generated for various pagination formats.

Offset-based pagination

If you have an offset-based pagination which accepts an `offset` and a `page_size` query parameter, you can base64 encode the offset and limit to create a cursor.

Imagine the customer has 1000 contacts and you are fetching 20 contacts at a time. The first request to our Unified API would look like this:

GET /api/unified/contacts?limit=20

We map the `limit` query parameter of the Unified API to the `page_size` query parameter of the CRM API.

The response from our Unified API would look like this:

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "b2Zmc2V0PTIwJnBhZ2Vfc2l6ZT0yMA"
}

The cursor is base64 encoded and contains the following string: `offset=20&page_size=20`.

As you can see, the cursor contains the offset and limit which can be used to fetch the next set of data.

The next request to fetch the next set of contacts would look like this:

GET /contacts?next_cursor=b2Zmc2V0PTIwJnBhZ2Vfc2l6ZT0yMA&limit=20

Since the `next_cursor` query parameter is now provided, Truto decodes the cursor to get the `offset` and `page_size` query parameters and passes them to the CRM API to fetch the next set of data.

Here `limit` query parameter of the Unified API is ignored.

The response from the CRM API would look like this:

{
  "data": [
    {
      "id": 21,
      "name": "John Doe"
    },
    {
      "id": 22,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "b2Zmc2V0PTQwJnBhZ2Vfc2l6ZT0yMA"
}

Knowing when to stop

Truto checks if the number of records returned by the CRM API is less than the limit specified by the client. If the number of records is less than the limit, it means that there are no more records to fetch and the next_cursor will be null.

Page-based pagination

If you have a page-based pagination which accepts a `page` and a `pageSize` query parameter, you can base64 encode the page and pageSize to create a cursor.

Imagine the customer has 1000 contacts and you are fetching 20 contacts at a time. The first request to our Unified API would look like this:

GET /api/unified/contacts?limit=20

We map the `limit` query parameter of the Unified API to the `pageSize` query parameter of the CRM API.

The response from our Unified API would look like this:

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "cGFnZT0yJnBhZ2VTaXplPTIw"
}

The cursor is base64 encoded and contains the following string: `page=2&pageSize=20`. Here we assume the API is 1-indexed.

As you can see, the cursor contains the page and pageSize which can be used to fetch the next set of data.

The next request to fetch the next set of contacts would look like this:

GET /contacts?next_cursor=cGFnZT0yJnBhZ2VTaXplPTIw&limit=20

Since the `next_cursor` query parameter is now provided, Truto decodes the cursor to get the `page` and `pageSize` query parameters and passes them to the CRM API to fetch the next set of data.

Here `limit` query parameter of the Unified API is ignored.

The response from the CRM API would look like this:

{
  "data": [
    {
      "id": 21,
      "name": "John Doe"
    },
    {
      "id": 22,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "cGFnZT0zJnBhZ2VTaXplPTIw"
}

Knowing when to stop

Truto checks if the number of records returned by the CRM API is less than the pageSize specified by the client. If the number of records is less than the pageSize, it means that there are no more records to fetch and the next_cursor will be null.

Link header pagination

The link header pagination format is easy to handle because underlying APIs directly return the exact URL to fetch the next set of data.

So the approach here is to first parse the header from the response and then extract the URL with `rel="next"` and then just use the query parameters in that as the value of the `next_cursor`.

Imagine the customer has 1000 contacts and you are fetching 20 contacts at a time. The first request to our Unified API would look like this:

GET /api/unified/contacts?limit=20&foo=bar

We map the `limit` query parameter of the Unified API to the `pageSize` query parameter of the CRM API. Sometimes, even though the APIs use link header pagination, the query parameter name might be different, so Truto makes it configurable, but by default, it uses `pageSize`.

We are also passing an extra query parameter `foo=bar` to the CRM API, which is not used in the pagination.

The response from our Unified API would look like this:

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "cGFnZT0yJnBhZ2VTaXplPTIwJmZvbz1iYXI"
}

The cursor is base64 encoded and contains the following string: `page=2&pageSize=20&foo=bar`.

As you can see, the cursor contains the page, pageSize, and the extra query parameter `foo`. The cursor is calculated from an link header like this:

Link: <https://api.crm.com/contacts?page=2&pageSize=20&foo=bar>; rel="next"

The next request to fetch the next set of contacts would look like this:

GET /contacts?next_cursor=cGFnZT0yJnBhZ2VTaXplPTIwJmZvbz1iYXI&limit=20

The `limit` query parameter of the Unified API is ignored.

Knowing when to stop

Truto checks if the link header contains a URL with `rel="next"`. If it does not contain a URL with `rel="next"`, it means that there are no more records to fetch and the next_cursor will be null.

Cursor-based pagination

Cursor-based pagination is easy to handle because underlying APIs directly return the cursor to fetch the next set of data.

As mentioned earlier, the cursor can be transparent or opaque, and it can be either found a field in the response body or needs to be calculated from the response body (Stripe, where the next_cursor is the ID of the last record in the response).

So Truto provides a JSON path to extract the cursor from the response.

For example, if the raw API response looks something like,

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "meta": {
    "pagination": {
      "next": "foobar"
    }
  }
}

Then the JSON path to extract the cursor would be `meta.pagination.next`.

In case of Stripe, the cursor is the ID of the last record in the response, so the JSON path would be `data[-1].id`. `-1` is used to get the last element of the array.

We just use the value found at the JSON path as the value of the `next_cursor`.

Knowing when to stop

Truto checks if the cursor (value found at the JSON path) is null. If the cursor is null, it means that there are no more records to fetch and the next_cursor will be null.

Some APIs also return a `has_more` or equivalent field which is typically a boolean, in the response body to indicate if there are more records to fetch. Truto also checks this field to know when to stop.

When APIs return the next link in the response body

In some cases, the APIs directly return the link to the next set of data in the response body regardless of the pagination format. So Truto considers them as cursor-based pagination and extracts the URL query parameters from the next link in the response body and encodes them into the `next_cursor`.

For example, if the raw API response looks something like,

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next": "https://api.crm.com/contacts?page=2&pageSize=20"
}

Truto extracts the query parameters from the `next` field and encodes them into the `next_cursor`.

Quirks and edge cases

We will now list a few quirks and edge cases Truto handles for the various pagination formats found in the wild —

  • The API path for fetching the subsequent pages might be different from the initial request. There are two famous products which have slight variations of this

  • Dropbox: Dropbox provides one URL for the initial request and a different URL for fetching the subsequent pages, and and and, you can't use any other query parameters in the subsequent requests. Truto solves this by making the pagination URL configurable (pagination_path) and adding a flag to ignore the query parameters in the subsequent requests (ignore_limit_in_pagination).

  • Salesforce: In thier SOQL Query API, Salesforce returns the URL to the next set of records, but here it's not enough to just encode the query parameters, the path of the URL also needs to be considered. Truto solves this by adding a flag to include the path of the URL as well (include_path).

  • Starting offset and page number might be 0 or 1. Truto handles this by making the starting offset and page configurable.

  • Enforcing maximum limit. Truto enforces a maximum limit on the number of records that can be fetched in a single request. This is configurable based on the maximum limit of the underlying API.

  • Some APIs might always require the limit or other query parameters to be present in the request. Truto handles this by always including the default values of the query parameters in the request.

  • Not all APIs accept the pagination data in the query parameters. Some APIs accept the pagination data in the request body. Truto handles this by decoupling the pagination parsing and encoding from the request building.

  • Some cursor based APIs never return a null cursor. So Truto checks if the number of records fetched in the response is less than the limit and stops fetching the records. It also employs a SHA-1 has of the request body to check if the same records are being fetched again.

This blog should serve as a guide to anyone who is looking to build a generic pagination system for their API integrations layer. It highlights how Truto has built its generic pagination system for it's real-time Unified API to handle various pagination formats without writing integration-specific code. And most importantly, it goes over the various edge cases and quirks it had to handle over the course of looking at and building over 250+ APIs and integrations.

Real-time Unified API in this context means that Truto is able to fetch data from various APIs, normalize it into a common format, and serve it to the end-user in real-time. This is done without storing any of the end-user data.

If you are caching the data from the API responses with you, either in a database or any kind of datastore, you might not need to read this blog because all of the cached data can be paginated in whatever format you think is best for your use case.

The blog will cover the following topics:

  • Introduction about Truto

  • What is pagination?

  • Different formats of pagination

  • How Truto normalizes the pagination formats

  • Quirks and edge cases

This guide is intended for developers with prior experience in interacting with APIs and building API integrations.

Introduction about Truto

Truto is a Unified API platform that allows you to build native HTTP based API integrations within your application. It handles all the grunt work related to API integrations like authentication, pagination, and error handling. All of this in real-time without storing any of your end-user data.

What is pagination?

If you have ever worked with REST or GraphQL APIs, you might have come across pagination. Pagination is a technique used to break down a large set of data into smaller chunks. For example, if you have a list of 1000 contacts in a CRM, you might not want to or cannot fetch all of them at once in certain scenarios. Pagination allows you to fetch a subset of the data at a time, like 20 or 50 contacts at a time.

Different formats of pagination

Some of the most common pagination formats are:

1. Offset-based pagination: This format uses an offset and a limit to fetch the data. For example, to fetch the first 20 contacts, you would use an offset of 0 and a limit of 20. To fetch the next 20 contacts, you would use an offset of 20 and a limit of 20. The most common starting offset is 0, but some APIs might use 1 as the starting offset. Most APIs out there use query parameters to specify the offset and limit. For example:

GET /contacts?offset=0&limit=20 GET /contacts?offset=20&limit=20

Some of them might also specify it in the request body.

2. Page-based pagination: This format uses a page number and a page size to fetch the data. For example, to fetch the first page of 20 contacts, you would use page number 1 and a page size of 20. To fetch the next page of 20 contacts, you would use page number 2 and a page size of 20. The most common starting page number is 1, but some APIs might use 0 as the starting page number.

GET /contacts?page=1&pageSize=20 GET /contacts?page=2&pageSize=20

3. Cursor-based pagination: This format uses a cursor to fetch the data. The cursor is a unique identifier that points to a specific record in the dataset. To fetch the next set of data, you would use the cursor returned in the previous response.

The cursor can be opaque or transparent.

A transparent cursor is a direct reference to the record in the dataset, for example, the primary key of a record. Stripe uses transparent cursors in their APIs, where the cursor is the ID of the record and you would pass the ID of the last record fetched to get the next set of records.

An opaque cursor may or may not be a direct reference to the record in the dataset. It is usually a token that encodes the state of the query. You would pass this token to get the next set of records.

Cursor-based paginations also accept a limit to specify the number of records to fetch.

Most of the APIs use query parameters to specify the cursor and limit. For example:

GET /contacts?cursor=abc123&limit=20

The cursor can be found from the response of the previous request. It can be in the response body or in the response headers.

{ 
  "data": [ 
    { "id": 1, "name": "John Doe" } 
  ], 
    "cursor": "abc123" 
}

4. Link header pagination: This format uses the Link header to specify the next and previous URLs to fetch the data. The Link header contains the URL to the next and previous pages.

They usually have two query parameters, `page` and `pageSize` and can be thought of a special case of page-based pagination which has been standardized by RFC 5988.

Github uses this format in their APIs.

The link header looks like this:

Link: <https://api.example.com/contacts?page=2&pageSize=20>; rel="next", <https://api.example.com/contacts?page=1&pageSize=20>; rel="prev"

Here `rel="next"` specifies the URL to the next page and `rel="prev"` specifies the URL to the previous page.

How Truto normalizes the pagination formats

Simple answer: Cursor-based pagination. Truto uses cursor-based pagination to normalize the pagination formats.

Cursor-based pagination is a format which can be used to represent all the other formats of pagination mentioned above. We take the underlying pagination data and base64 encode it to create an opaque cursor. Opaque here means that the cursor is not a direct reference to the record in the dataset, but it encodes the state of the query. You can still base64 decode the cursor to get the underlying pagination data.

Truto's pagination format

Truto's pagination format contains two query parameters: `next_cursor` and `limit`. The `next_cursor` is the cursor to fetch the next set of data and the `limit` is the number of records to fetch in a single API call.

An example request to fetch the first 20 contacts from a CRM using the Unified API would look like this:

GET /api/unified/contacts?limit=20

The response from our Unified API would look like this:

{ 
  "results": [
    { "id": 1, "name": "John Doe" 
    }, 
    { "id": 2, "name": "Jane Doe" 
    }, 
    ... 
  ], 
  "next_cursor": "b2Zmc2V0PTIwJmxpbWl0PTIw" 
}

And the next request to fetch the next set of contacts would look like this:

GET /api/unified/contacts?next_cursor=b2Zmc2V0PTIwJmxpbWl0PTIw&limit=20

The next_cursor contains an opaque cursor which the client can use to fetch the next set of data.

Once the client has fetched all the data, the next_cursor will be null.

How the cursor is generated for various pagination formats

We'll now look at how the `next_cursor` is generated for various pagination formats.

Offset-based pagination

If you have an offset-based pagination which accepts an `offset` and a `page_size` query parameter, you can base64 encode the offset and limit to create a cursor.

Imagine the customer has 1000 contacts and you are fetching 20 contacts at a time. The first request to our Unified API would look like this:

GET /api/unified/contacts?limit=20

We map the `limit` query parameter of the Unified API to the `page_size` query parameter of the CRM API.

The response from our Unified API would look like this:

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "b2Zmc2V0PTIwJnBhZ2Vfc2l6ZT0yMA"
}

The cursor is base64 encoded and contains the following string: `offset=20&page_size=20`.

As you can see, the cursor contains the offset and limit which can be used to fetch the next set of data.

The next request to fetch the next set of contacts would look like this:

GET /contacts?next_cursor=b2Zmc2V0PTIwJnBhZ2Vfc2l6ZT0yMA&limit=20

Since the `next_cursor` query parameter is now provided, Truto decodes the cursor to get the `offset` and `page_size` query parameters and passes them to the CRM API to fetch the next set of data.

Here `limit` query parameter of the Unified API is ignored.

The response from the CRM API would look like this:

{
  "data": [
    {
      "id": 21,
      "name": "John Doe"
    },
    {
      "id": 22,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "b2Zmc2V0PTQwJnBhZ2Vfc2l6ZT0yMA"
}

Knowing when to stop

Truto checks if the number of records returned by the CRM API is less than the limit specified by the client. If the number of records is less than the limit, it means that there are no more records to fetch and the next_cursor will be null.

Page-based pagination

If you have a page-based pagination which accepts a `page` and a `pageSize` query parameter, you can base64 encode the page and pageSize to create a cursor.

Imagine the customer has 1000 contacts and you are fetching 20 contacts at a time. The first request to our Unified API would look like this:

GET /api/unified/contacts?limit=20

We map the `limit` query parameter of the Unified API to the `pageSize` query parameter of the CRM API.

The response from our Unified API would look like this:

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "cGFnZT0yJnBhZ2VTaXplPTIw"
}

The cursor is base64 encoded and contains the following string: `page=2&pageSize=20`. Here we assume the API is 1-indexed.

As you can see, the cursor contains the page and pageSize which can be used to fetch the next set of data.

The next request to fetch the next set of contacts would look like this:

GET /contacts?next_cursor=cGFnZT0yJnBhZ2VTaXplPTIw&limit=20

Since the `next_cursor` query parameter is now provided, Truto decodes the cursor to get the `page` and `pageSize` query parameters and passes them to the CRM API to fetch the next set of data.

Here `limit` query parameter of the Unified API is ignored.

The response from the CRM API would look like this:

{
  "data": [
    {
      "id": 21,
      "name": "John Doe"
    },
    {
      "id": 22,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "cGFnZT0zJnBhZ2VTaXplPTIw"
}

Knowing when to stop

Truto checks if the number of records returned by the CRM API is less than the pageSize specified by the client. If the number of records is less than the pageSize, it means that there are no more records to fetch and the next_cursor will be null.

Link header pagination

The link header pagination format is easy to handle because underlying APIs directly return the exact URL to fetch the next set of data.

So the approach here is to first parse the header from the response and then extract the URL with `rel="next"` and then just use the query parameters in that as the value of the `next_cursor`.

Imagine the customer has 1000 contacts and you are fetching 20 contacts at a time. The first request to our Unified API would look like this:

GET /api/unified/contacts?limit=20&foo=bar

We map the `limit` query parameter of the Unified API to the `pageSize` query parameter of the CRM API. Sometimes, even though the APIs use link header pagination, the query parameter name might be different, so Truto makes it configurable, but by default, it uses `pageSize`.

We are also passing an extra query parameter `foo=bar` to the CRM API, which is not used in the pagination.

The response from our Unified API would look like this:

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "cGFnZT0yJnBhZ2VTaXplPTIwJmZvbz1iYXI"
}

The cursor is base64 encoded and contains the following string: `page=2&pageSize=20&foo=bar`.

As you can see, the cursor contains the page, pageSize, and the extra query parameter `foo`. The cursor is calculated from an link header like this:

Link: <https://api.crm.com/contacts?page=2&pageSize=20&foo=bar>; rel="next"

The next request to fetch the next set of contacts would look like this:

GET /contacts?next_cursor=cGFnZT0yJnBhZ2VTaXplPTIwJmZvbz1iYXI&limit=20

The `limit` query parameter of the Unified API is ignored.

Knowing when to stop

Truto checks if the link header contains a URL with `rel="next"`. If it does not contain a URL with `rel="next"`, it means that there are no more records to fetch and the next_cursor will be null.

Cursor-based pagination

Cursor-based pagination is easy to handle because underlying APIs directly return the cursor to fetch the next set of data.

As mentioned earlier, the cursor can be transparent or opaque, and it can be either found a field in the response body or needs to be calculated from the response body (Stripe, where the next_cursor is the ID of the last record in the response).

So Truto provides a JSON path to extract the cursor from the response.

For example, if the raw API response looks something like,

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "meta": {
    "pagination": {
      "next": "foobar"
    }
  }
}

Then the JSON path to extract the cursor would be `meta.pagination.next`.

In case of Stripe, the cursor is the ID of the last record in the response, so the JSON path would be `data[-1].id`. `-1` is used to get the last element of the array.

We just use the value found at the JSON path as the value of the `next_cursor`.

Knowing when to stop

Truto checks if the cursor (value found at the JSON path) is null. If the cursor is null, it means that there are no more records to fetch and the next_cursor will be null.

Some APIs also return a `has_more` or equivalent field which is typically a boolean, in the response body to indicate if there are more records to fetch. Truto also checks this field to know when to stop.

When APIs return the next link in the response body

In some cases, the APIs directly return the link to the next set of data in the response body regardless of the pagination format. So Truto considers them as cursor-based pagination and extracts the URL query parameters from the next link in the response body and encodes them into the `next_cursor`.

For example, if the raw API response looks something like,

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next": "https://api.crm.com/contacts?page=2&pageSize=20"
}

Truto extracts the query parameters from the `next` field and encodes them into the `next_cursor`.

Quirks and edge cases

We will now list a few quirks and edge cases Truto handles for the various pagination formats found in the wild —

  • The API path for fetching the subsequent pages might be different from the initial request. There are two famous products which have slight variations of this

  • Dropbox: Dropbox provides one URL for the initial request and a different URL for fetching the subsequent pages, and and and, you can't use any other query parameters in the subsequent requests. Truto solves this by making the pagination URL configurable (pagination_path) and adding a flag to ignore the query parameters in the subsequent requests (ignore_limit_in_pagination).

  • Salesforce: In thier SOQL Query API, Salesforce returns the URL to the next set of records, but here it's not enough to just encode the query parameters, the path of the URL also needs to be considered. Truto solves this by adding a flag to include the path of the URL as well (include_path).

  • Starting offset and page number might be 0 or 1. Truto handles this by making the starting offset and page configurable.

  • Enforcing maximum limit. Truto enforces a maximum limit on the number of records that can be fetched in a single request. This is configurable based on the maximum limit of the underlying API.

  • Some APIs might always require the limit or other query parameters to be present in the request. Truto handles this by always including the default values of the query parameters in the request.

  • Not all APIs accept the pagination data in the query parameters. Some APIs accept the pagination data in the request body. Truto handles this by decoupling the pagination parsing and encoding from the request building.

  • Some cursor based APIs never return a null cursor. So Truto checks if the number of records fetched in the response is less than the limit and stops fetching the records. It also employs a SHA-1 has of the request body to check if the same records are being fetched again.

This blog should serve as a guide to anyone who is looking to build a generic pagination system for their API integrations layer. It highlights how Truto has built its generic pagination system for it's real-time Unified API to handle various pagination formats without writing integration-specific code. And most importantly, it goes over the various edge cases and quirks it had to handle over the course of looking at and building over 250+ APIs and integrations.

Real-time Unified API in this context means that Truto is able to fetch data from various APIs, normalize it into a common format, and serve it to the end-user in real-time. This is done without storing any of the end-user data.

If you are caching the data from the API responses with you, either in a database or any kind of datastore, you might not need to read this blog because all of the cached data can be paginated in whatever format you think is best for your use case.

The blog will cover the following topics:

  • Introduction about Truto

  • What is pagination?

  • Different formats of pagination

  • How Truto normalizes the pagination formats

  • Quirks and edge cases

This guide is intended for developers with prior experience in interacting with APIs and building API integrations.

Introduction about Truto

Truto is a Unified API platform that allows you to build native HTTP based API integrations within your application. It handles all the grunt work related to API integrations like authentication, pagination, and error handling. All of this in real-time without storing any of your end-user data.

What is pagination?

If you have ever worked with REST or GraphQL APIs, you might have come across pagination. Pagination is a technique used to break down a large set of data into smaller chunks. For example, if you have a list of 1000 contacts in a CRM, you might not want to or cannot fetch all of them at once in certain scenarios. Pagination allows you to fetch a subset of the data at a time, like 20 or 50 contacts at a time.

Different formats of pagination

Some of the most common pagination formats are:

1. Offset-based pagination: This format uses an offset and a limit to fetch the data. For example, to fetch the first 20 contacts, you would use an offset of 0 and a limit of 20. To fetch the next 20 contacts, you would use an offset of 20 and a limit of 20. The most common starting offset is 0, but some APIs might use 1 as the starting offset. Most APIs out there use query parameters to specify the offset and limit. For example:

GET /contacts?offset=0&limit=20 GET /contacts?offset=20&limit=20

Some of them might also specify it in the request body.

2. Page-based pagination: This format uses a page number and a page size to fetch the data. For example, to fetch the first page of 20 contacts, you would use page number 1 and a page size of 20. To fetch the next page of 20 contacts, you would use page number 2 and a page size of 20. The most common starting page number is 1, but some APIs might use 0 as the starting page number.

GET /contacts?page=1&pageSize=20 GET /contacts?page=2&pageSize=20

3. Cursor-based pagination: This format uses a cursor to fetch the data. The cursor is a unique identifier that points to a specific record in the dataset. To fetch the next set of data, you would use the cursor returned in the previous response.

The cursor can be opaque or transparent.

A transparent cursor is a direct reference to the record in the dataset, for example, the primary key of a record. Stripe uses transparent cursors in their APIs, where the cursor is the ID of the record and you would pass the ID of the last record fetched to get the next set of records.

An opaque cursor may or may not be a direct reference to the record in the dataset. It is usually a token that encodes the state of the query. You would pass this token to get the next set of records.

Cursor-based paginations also accept a limit to specify the number of records to fetch.

Most of the APIs use query parameters to specify the cursor and limit. For example:

GET /contacts?cursor=abc123&limit=20

The cursor can be found from the response of the previous request. It can be in the response body or in the response headers.

{ 
  "data": [ 
    { "id": 1, "name": "John Doe" } 
  ], 
    "cursor": "abc123" 
}

4. Link header pagination: This format uses the Link header to specify the next and previous URLs to fetch the data. The Link header contains the URL to the next and previous pages.

They usually have two query parameters, `page` and `pageSize` and can be thought of a special case of page-based pagination which has been standardized by RFC 5988.

Github uses this format in their APIs.

The link header looks like this:

Link: <https://api.example.com/contacts?page=2&pageSize=20>; rel="next", <https://api.example.com/contacts?page=1&pageSize=20>; rel="prev"

Here `rel="next"` specifies the URL to the next page and `rel="prev"` specifies the URL to the previous page.

How Truto normalizes the pagination formats

Simple answer: Cursor-based pagination. Truto uses cursor-based pagination to normalize the pagination formats.

Cursor-based pagination is a format which can be used to represent all the other formats of pagination mentioned above. We take the underlying pagination data and base64 encode it to create an opaque cursor. Opaque here means that the cursor is not a direct reference to the record in the dataset, but it encodes the state of the query. You can still base64 decode the cursor to get the underlying pagination data.

Truto's pagination format

Truto's pagination format contains two query parameters: `next_cursor` and `limit`. The `next_cursor` is the cursor to fetch the next set of data and the `limit` is the number of records to fetch in a single API call.

An example request to fetch the first 20 contacts from a CRM using the Unified API would look like this:

GET /api/unified/contacts?limit=20

The response from our Unified API would look like this:

{ 
  "results": [
    { "id": 1, "name": "John Doe" 
    }, 
    { "id": 2, "name": "Jane Doe" 
    }, 
    ... 
  ], 
  "next_cursor": "b2Zmc2V0PTIwJmxpbWl0PTIw" 
}

And the next request to fetch the next set of contacts would look like this:

GET /api/unified/contacts?next_cursor=b2Zmc2V0PTIwJmxpbWl0PTIw&limit=20

The next_cursor contains an opaque cursor which the client can use to fetch the next set of data.

Once the client has fetched all the data, the next_cursor will be null.

How the cursor is generated for various pagination formats

We'll now look at how the `next_cursor` is generated for various pagination formats.

Offset-based pagination

If you have an offset-based pagination which accepts an `offset` and a `page_size` query parameter, you can base64 encode the offset and limit to create a cursor.

Imagine the customer has 1000 contacts and you are fetching 20 contacts at a time. The first request to our Unified API would look like this:

GET /api/unified/contacts?limit=20

We map the `limit` query parameter of the Unified API to the `page_size` query parameter of the CRM API.

The response from our Unified API would look like this:

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "b2Zmc2V0PTIwJnBhZ2Vfc2l6ZT0yMA"
}

The cursor is base64 encoded and contains the following string: `offset=20&page_size=20`.

As you can see, the cursor contains the offset and limit which can be used to fetch the next set of data.

The next request to fetch the next set of contacts would look like this:

GET /contacts?next_cursor=b2Zmc2V0PTIwJnBhZ2Vfc2l6ZT0yMA&limit=20

Since the `next_cursor` query parameter is now provided, Truto decodes the cursor to get the `offset` and `page_size` query parameters and passes them to the CRM API to fetch the next set of data.

Here `limit` query parameter of the Unified API is ignored.

The response from the CRM API would look like this:

{
  "data": [
    {
      "id": 21,
      "name": "John Doe"
    },
    {
      "id": 22,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "b2Zmc2V0PTQwJnBhZ2Vfc2l6ZT0yMA"
}

Knowing when to stop

Truto checks if the number of records returned by the CRM API is less than the limit specified by the client. If the number of records is less than the limit, it means that there are no more records to fetch and the next_cursor will be null.

Page-based pagination

If you have a page-based pagination which accepts a `page` and a `pageSize` query parameter, you can base64 encode the page and pageSize to create a cursor.

Imagine the customer has 1000 contacts and you are fetching 20 contacts at a time. The first request to our Unified API would look like this:

GET /api/unified/contacts?limit=20

We map the `limit` query parameter of the Unified API to the `pageSize` query parameter of the CRM API.

The response from our Unified API would look like this:

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "cGFnZT0yJnBhZ2VTaXplPTIw"
}

The cursor is base64 encoded and contains the following string: `page=2&pageSize=20`. Here we assume the API is 1-indexed.

As you can see, the cursor contains the page and pageSize which can be used to fetch the next set of data.

The next request to fetch the next set of contacts would look like this:

GET /contacts?next_cursor=cGFnZT0yJnBhZ2VTaXplPTIw&limit=20

Since the `next_cursor` query parameter is now provided, Truto decodes the cursor to get the `page` and `pageSize` query parameters and passes them to the CRM API to fetch the next set of data.

Here `limit` query parameter of the Unified API is ignored.

The response from the CRM API would look like this:

{
  "data": [
    {
      "id": 21,
      "name": "John Doe"
    },
    {
      "id": 22,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "cGFnZT0zJnBhZ2VTaXplPTIw"
}

Knowing when to stop

Truto checks if the number of records returned by the CRM API is less than the pageSize specified by the client. If the number of records is less than the pageSize, it means that there are no more records to fetch and the next_cursor will be null.

Link header pagination

The link header pagination format is easy to handle because underlying APIs directly return the exact URL to fetch the next set of data.

So the approach here is to first parse the header from the response and then extract the URL with `rel="next"` and then just use the query parameters in that as the value of the `next_cursor`.

Imagine the customer has 1000 contacts and you are fetching 20 contacts at a time. The first request to our Unified API would look like this:

GET /api/unified/contacts?limit=20&foo=bar

We map the `limit` query parameter of the Unified API to the `pageSize` query parameter of the CRM API. Sometimes, even though the APIs use link header pagination, the query parameter name might be different, so Truto makes it configurable, but by default, it uses `pageSize`.

We are also passing an extra query parameter `foo=bar` to the CRM API, which is not used in the pagination.

The response from our Unified API would look like this:

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next_cursor": "cGFnZT0yJnBhZ2VTaXplPTIwJmZvbz1iYXI"
}

The cursor is base64 encoded and contains the following string: `page=2&pageSize=20&foo=bar`.

As you can see, the cursor contains the page, pageSize, and the extra query parameter `foo`. The cursor is calculated from an link header like this:

Link: <https://api.crm.com/contacts?page=2&pageSize=20&foo=bar>; rel="next"

The next request to fetch the next set of contacts would look like this:

GET /contacts?next_cursor=cGFnZT0yJnBhZ2VTaXplPTIwJmZvbz1iYXI&limit=20

The `limit` query parameter of the Unified API is ignored.

Knowing when to stop

Truto checks if the link header contains a URL with `rel="next"`. If it does not contain a URL with `rel="next"`, it means that there are no more records to fetch and the next_cursor will be null.

Cursor-based pagination

Cursor-based pagination is easy to handle because underlying APIs directly return the cursor to fetch the next set of data.

As mentioned earlier, the cursor can be transparent or opaque, and it can be either found a field in the response body or needs to be calculated from the response body (Stripe, where the next_cursor is the ID of the last record in the response).

So Truto provides a JSON path to extract the cursor from the response.

For example, if the raw API response looks something like,

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "meta": {
    "pagination": {
      "next": "foobar"
    }
  }
}

Then the JSON path to extract the cursor would be `meta.pagination.next`.

In case of Stripe, the cursor is the ID of the last record in the response, so the JSON path would be `data[-1].id`. `-1` is used to get the last element of the array.

We just use the value found at the JSON path as the value of the `next_cursor`.

Knowing when to stop

Truto checks if the cursor (value found at the JSON path) is null. If the cursor is null, it means that there are no more records to fetch and the next_cursor will be null.

Some APIs also return a `has_more` or equivalent field which is typically a boolean, in the response body to indicate if there are more records to fetch. Truto also checks this field to know when to stop.

When APIs return the next link in the response body

In some cases, the APIs directly return the link to the next set of data in the response body regardless of the pagination format. So Truto considers them as cursor-based pagination and extracts the URL query parameters from the next link in the response body and encodes them into the `next_cursor`.

For example, if the raw API response looks something like,

{
  "data": [
    {
      "id": 1,
      "name": "John Doe"
    },
    {
      "id": 2,
      "name": "Jane Doe"
    },
    ...
  ],
  "next": "https://api.crm.com/contacts?page=2&pageSize=20"
}

Truto extracts the query parameters from the `next` field and encodes them into the `next_cursor`.

Quirks and edge cases

We will now list a few quirks and edge cases Truto handles for the various pagination formats found in the wild —

  • The API path for fetching the subsequent pages might be different from the initial request. There are two famous products which have slight variations of this

  • Dropbox: Dropbox provides one URL for the initial request and a different URL for fetching the subsequent pages, and and and, you can't use any other query parameters in the subsequent requests. Truto solves this by making the pagination URL configurable (pagination_path) and adding a flag to ignore the query parameters in the subsequent requests (ignore_limit_in_pagination).

  • Salesforce: In thier SOQL Query API, Salesforce returns the URL to the next set of records, but here it's not enough to just encode the query parameters, the path of the URL also needs to be considered. Truto solves this by adding a flag to include the path of the URL as well (include_path).

  • Starting offset and page number might be 0 or 1. Truto handles this by making the starting offset and page configurable.

  • Enforcing maximum limit. Truto enforces a maximum limit on the number of records that can be fetched in a single request. This is configurable based on the maximum limit of the underlying API.

  • Some APIs might always require the limit or other query parameters to be present in the request. Truto handles this by always including the default values of the query parameters in the request.

  • Not all APIs accept the pagination data in the query parameters. Some APIs accept the pagination data in the request body. Truto handles this by decoupling the pagination parsing and encoding from the request building.

  • Some cursor based APIs never return a null cursor. So Truto checks if the number of records fetched in the response is less than the limit and stops fetching the records. It also employs a SHA-1 has of the request body to check if the same records are being fetched again.

In this article

Content Title

Content Title

Content Title

Learn how Truto helps product teams build integrations faster

by

The Truto Team

Posted

Jun 28, 2024

LinkedIn
Twitter Logo
Link

In this article

Declarative Pagination System in Truto Unified Real-time API

More from our Blog

All Posts

Launching FetchDB: A drop-in MongoDB Atlas Data API Alternative

A seamless alternative to the MongoDB Atlas Data API. Without any change to your current logic.

All Posts

Launching FetchDB: A drop-in MongoDB Atlas Data API Alternative

A seamless alternative to the MongoDB Atlas Data API. Without any change to your current logic.

All Posts

Launching FetchDB: A drop-in MongoDB Atlas Data API Alternative

A seamless alternative to the MongoDB Atlas Data API. Without any change to your current logic.

Security

Successfully Completed SOC 2 Type II Audit for Year 2 | Truto

Truto completes its SOC 2 Type II audit for Year 2 successfully. Learn more about what this means for our customers.

Security

Successfully Completed SOC 2 Type II Audit for Year 2 | Truto

Truto completes its SOC 2 Type II audit for Year 2 successfully. Learn more about what this means for our customers.

Security

Successfully Completed SOC 2 Type II Audit for Year 2 | Truto

Truto completes its SOC 2 Type II audit for Year 2 successfully. Learn more about what this means for our customers.

Guides

Separating the API Integration Layer for Optimal Integration Design: Insights from Lalit, CTO at Clearfeed.ai

Learn why separating the API integration layer from your app is critical for a fail-safe integration architecture from Lalit, CTO at Clearfeed.ai

Guides

Separating the API Integration Layer for Optimal Integration Design: Insights from Lalit, CTO at Clearfeed.ai

Learn why separating the API integration layer from your app is critical for a fail-safe integration architecture from Lalit, CTO at Clearfeed.ai

Guides

Separating the API Integration Layer for Optimal Integration Design: Insights from Lalit, CTO at Clearfeed.ai

Learn why separating the API integration layer from your app is critical for a fail-safe integration architecture from Lalit, CTO at Clearfeed.ai

Take back focus where it matters. Let Truto do integrations.

Learn more about our unified API service and solutions. This is a short, crisp 30-minute call with folks who understand the problem of alternatives.

Take back focus where it matters. Let Truto do integrations.

Learn more about our unified API service and solutions. This is a short, crisp 30-minute call with folks who understand the problem of alternatives.

Take back focus where it matters. Let Truto do integrations.

Learn more about our unified API service and solutions. This is a short, crisp 30-minute call with folks who understand the problem of alternatives.