Using a Bash Script to Generate Random Fake Data To Insert Into DynamoDB
2021 March 30

I’ve been studying for my developer associate certification on Amazon Web Services. I’ve been practicing with the various streaming solutions like SQS and Kinesis lately. I’ve wanted to branch into using DynamoDB and capturing changes. To do that I needed to generate a lot of take data.

We’ll start by creating our test table using cloudformation. Here’s the resources section only:

{
    "TestingTable": {
      "Type" : "AWS::DynamoDB::Table",
      "Properties" : {
        "BillingMode" : "PROVISIONED",
        "AttributeDefinitions" : [
          {
            "AttributeName": "mySortKey",
            "AttributeType" : "S"
          },
          {
            "AttributeName" : "referenceId",
            "AttributeType" : "B"
          }
        ],
        "KeySchema" : [
          {
            "AttributeName": "referenceId",
            "KeyType" : "HASH"
          },
          {
            "AttributeName" : "mySortKey",
            "KeyType" : "RANGE"
          }
        ],
        "ProvisionedThroughput" : {
          "ReadCapacityUnits" : "5",
          "WriteCapacityUnits" : "5"
        }
      }
    }
}

Place the above in a standard cloudformation template, it will create a DynamoDB table in the stack. At the moment there doesn’t seem to be a way to automatically connect the Kinesis stream with DynamoDB using plain cloudformation so just set it up in the console. The DynamoDB table has 2 required attributes referenceId and mySortKey. The names of the created resources can be found like so in the shell for future commands involving them:

KINESIS_STREAM=$(aws cloudformation describe-stack-resource --stack-name $STACK_NAME \
    --logical-resource-id ResultStream \
    --query 'StackResourceDetail.PhysicalResourceId' --output text)
DB_TABLE=$(aws cloudformation describe-stack-resource --stack-name $STACK_NAME \
    --logical-resource TestingTable \
    --query 'StackResourceDetail.PhysicalResourceId' --output text)

The records accepted by the table may look something like this, the only required keys in the JSON blob are the aforementioned keys. Additionally one must specify the data types for each value, because reasons.

{
    "mySortKey": {"S": "xyz"},
    "referenceId": {"B": "ABCDEF"},
    "payload": {"S": "bar"},
    "payload2": {"S": "foor"},
    "seq": {"N": "1"}
}

Generating things like this is montious at best and massively painful at worst, so let’s automated the generation. The first step is to write a function that generates random alphanumeric characters since that’s good enough for this:

function make_random_string() {
    LENGTH=$1
    tr -dc A-Za-z0-9 </dev/urandom | head -c $LENGTH
}

This shell function takes a single parameter, the number of alphanumeric characters to generate. This lets up make up random values for the fields. Next let’s write a function that writes out the blob of JSON properly formatted to a known file. That file will be on the tmpfs to avoid writing garbage to our SSD.

function make_item_json() {
    uuid=$(echo $RANDOM | base64)
    test_case_name=$(make_random_string 250)
    payload=$(make_random_string 500)
    cat > /tmp/item.json << EOF
    {
        "testCaseName": {"S": "$1"},
        "testResultReferenceId": {"B": "$uuid"},
        "payload": {"S": "$payload"},
        "payload2": {"S": "$test_case_name"},
        "seq": {"N": "$1"}
    }
EOF
}

The above shell function uses a heredoc to cat the JSON to the file. This allows shell substitution to work with no complicated or error prone echo statements.

Finally, it’s time to start writing them out to DynamoDB. A standard for loop is used, and make_item_json is called as needed. The AWS cli is used to create the new items.

for i in {1..5000}; do
    echo $i
    make_item_json $i
    aws dynamodb put-item \
        --table-name $DB_TABLE \
        --item file:///tmp/item.json
done

References

Here are some helpful references for your needs.

*****
Written by Henry J Schmale on 2021 March 30