Migrating Data from one DynamoDB table to another one

Recently at work, I set up a new Cloudformation stack with Serverless framework. Because all resources related to the stack (databases, logging, monitoring etc.) should also be managed by Cloudformation (to have a single deployable unit), I also had to create a new DynamoDB table to go along with the service (Cloudformation Docs for DynamoDB tables):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
resources:
  Resources:
    AwesomeTable:
      Type: AWS::DynamoDB::Table
      Properties:
        TableName: AwesomeTable-production
        AttributeDefinitions:
          - AttributeName: awesome_id
            AttributeType: S
        KeySchema:
          - AttributeName: awesome_id
            KeyType: HASH
        BillingMode: PAY_PER_REQUEST

However, I still wanted to retain the data from the original DynamoDB table. This StackOverflow post helped me come up with the following shell script. It reads the data in batches from the old table and inserts it into the new one. The only dependencies for the script are awscli and jq.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/bin/bash

OLD_TABLE=AwesomeTable
NEW_TABLE=AwesomeTable-production
TMP_FILE=/tmp/inserts.json
batchSize=25
index=0

DATA=$(aws dynamodb scan --table-name $OLD_TABLE --max-items $batchSize)
((index+=1))
echo $DATA | jq ".Items | {\"$NEW_TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > "$TMP_FILE"
aws dynamodb batch-write-item --request-items "file://$TMP_FILE"

nextToken=$(echo $DATA | jq '.NextToken')
while [[ "${nextToken}" != "" ]]
do
  DATA=$(aws dynamodb scan --table-name $OLD_TABLE --max-items $batchSize --starting-token $nextToken)
  ((index+=1))
  echo $DATA | jq ".Items | {\"$NEW_TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
  if [ ! -s "$TMP_FILE" ]; then
    echo "Scan returned no data. Finished operation"
    exit
  fi
  aws dynamodb batch-write-item --request-items "file://$TMP_FILE"
  nextToken=$(echo $DATA | jq '.NextToken')
done

The code should be fairly self-explanatory. First we set up the required variables (line 3-6). Then, we read in the first batch of items from DynamoDB (line 9), format them as a new items to be inserted (temporarily stored in a file, line 10), and read in the new items from the file to the DynamoDB table (line 11).

If we have a NextToken in the API response (line 14), which indicates there is more data in the old table, we keep repeating this action until there is no more data (line 15 and 20).

The output should look something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
    "UnprocessedItems": {}
}
{
    "UnprocessedItems": {}
}
{
    "UnprocessedItems": {}
}
{
    "UnprocessedItems": {}
}
{
    "UnprocessedItems": {}
}
Scan returned no data. Finished operation

Please note that this only works for an empty “NEW_TABLE”. Updating data in an existing table is not supported (by this script).

I tried to verify the size of the new table as a basic sanity check, but unfortunately: Storage size and item count are not updated in real-time. They are updated periodically, roughly every six hours. (AWS)

1
2
3
4
5
aws dynamodb describe-table --table-name NEW_TABLE | jq '.Table | {"TableSizeBytes": .TableSizeBytes, "ItemCount": .ItemCount}'
{
  "TableSizeBytes": 0,
  "ItemCount": 0
}

Of course, you can still verify the data is available by querying each item individually.