A DynamoDB query can only return 1MB of results. So, how can you get all of the results of a query if it's bigger than that? There are two ways: the simple way and the cool way.
How DynamoDB Query Pagination Works
Before we start getting an insane number of records, let's back up and talk about how pagination works in DynamoDB. When a query has more records than it can return, either because you set the Limit
property or there's more than 1MB
of results, Dynamo will return all of the records that it can along with a property LastEvaluatedKey
.
As long as the LastEvaluatedKey
property is set in the query results, it means that you still have results left to get. If you want to get those results, update the original query parameters so that ExclusiveStartKey
is set to LastEvaluatedKey
and run the query again.
The Simple Way
The simple way is the most obvious: use a do loop. Make the condition check to see if the ExclusiveStartKey
was set. If it is set, then you know there are more records, and you should run the query again.
export const getAllResults = async <T>(
db: AWS.DynamoDB.DocumentClient,
query: AWS.DynamoDB.DocumentClient.QueryInput
): Promise<Array<T>> => {
const result = [] as Array<T>;
console.info(`Querying Dynamo`, query.ExpressionAttributeValues);
let queryResult: AWS.DynamoDB.DocumentClient.QueryOutput;
do {
queryResult = await db.query(query).promise();
console.info(`Query Result`, queryResult);
query.ExclusiveStartKey = queryResult?.LastEvaluatedKey;
queryResult.Items?.forEach(item => result.push(item as T));
} while (query.ExclusiveStartKey);
console.info(
`Found ${result.length} results for ${query.KeyConditionExpression}`,
inspect(query.ExpressionAttributeValues)
);
return result;
};
Using an Async Generator
The problem with a do-loop is that all of the results will be stored in memory. If you have a lot of records that could be a problem. The fancier way to do this is with a node.js async generator.
An async generator is a special type of iterator that allows you to yield
values as they're retrieved. The following code sample creates the DynamoDB async generator.
export async function* getAllResults<T>(
db: AWS.DynamoDB.DocumentClient,
query: AWS.DynamoDB.DocumentClient.QueryInput
) {
console.info(`Querying Dynamo`, query.ExpressionAttributeValues);
let queryResult: AWS.DynamoDB.DocumentClient.QueryOutput;
do {
queryResult = await db.query(query).promise();
console.info(`Query Result`, queryResult);
query.ExclusiveStartKey = queryResult?.LastEvaluatedKey;
if(queryResult.Items) {
yield* queryResult.Items;
}
} while (query.ExclusiveStartKey);
};
Calling It
Now that the generator is created, you can call it using the following awkward syntax.
for await (const record of getAllResults(db, query)) {
console.info(record);
}
Conclusion
To conclude, if you have a DynamoDB query that returns more than 1MB of results, the simple way to handle pagination is to use a do-loop, and the more efficient way is to use an async generator.