CSV To DynamoDB
The AWS Python SDK (Boto3) provides a “batch writer”, not present in the other language SDKs, that makes batch writing data to DynamoDB extremely intuitive.
A task came up where I needed to write a script upload about 300,000 unique rows from a PostgreSQL
query to a DynamoDB table. At first, the task seemed trivial but my initial
approach — using
aws in a
bash script to parse an exported JSON file — left me a bit confuzzled. One iteration looked like it would take around
80 hours to complete 😐 and another one completely crashed my laptop.
I eventually dug into the source code for the top google result for “csv to dynamodb”, an AWS Database Blog post. I stripped out unecessary parts of it and the solution turned out to be simple and totally trivial.
With this script, parsing a 300,000 line csv file and writing each to DynamoDB took about 7 minutes from my local machine. The code is also extremely simple!
for DynamoDB, but it's not intuitive to use. However, I learned that Boto3 has an additional
method, not present in the other libraries that makes batch writing extremely straight forward.
This lack of feature parity between official SDKs for the defacto cloud provider was actually kind of surprising.