CSV To DynamoDB

The AWS Python SDK (Boto3) provides a “batch writer”, not present in the other language SDKs, that makes batch writing data to DynamoDB extremely intuitive.

Kevin Wang


A task came up where I needed to write a script upload about 300,000 unique rows from a PostgreSQL query to a DynamoDB table. At first, the task seemed trivial but my initial approach — using jq and aws in a bash script to parse an exported JSON file — left me a bit confuzzled. One iteration looked like it would take around 80 hours to complete 😐 and another one completely crashed my laptop.

The solution

I eventually dug into the source code for the top google result for “csv to dynamodb”, an AWS Database Blog post. I stripped out unecessary parts of it and the solution turned out to be simple and totally trivial.

With this script, parsing a 300,000 line csv file and writing each to DynamoDB took about 7 minutes from my local machine. The code is also extremely simple!

Learnings

I'm most familiar with JavaScript/NodeJS but my day-to-days recently have been mostly Kotlin with a bit of Python. For this task, I considered all those respective AWS SDKs, as well as the CLI.

The AWS Java SDK, AWS JavaScript SDK, AWS Python SDK (Boto3), and AWS CLI all expose a batch_write_item method for DynamoDB, but it's not intuitive to use. However, I learned that Boto3 has an additional Table.batch_writer method, not present in the other libraries that makes batch writing extremely straight forward.

This lack of feature parity between official SDKs for the defacto cloud provider was actually kind of surprising.