I always wanted to have search on my site, but it’s statically hosted, so I can’t have any dynamic content on it. I recently resolved that by creating a little tiny lambda function on AWS that queries a SQLite database hosted on S3. Here I’ll walk you through how I created the back-end for it, and in some later posts I’ll include details on calculating the cost.
The first step is to build the SQLite database using the FTS3 extension. In order to do that I created a simple ruby plugin for jekyll that looped through all the posts and inserted them into an FTS table into a private location that would not be deployed to the site. The FTS table looks something like so:
We are using FTS3 instead of FTS4, because that’s what’s compiled and
enabled on the AWS Lambda container image, and it doesn’t feel like it’s
particularly wise to create and maintain an entire image for something
that I don’t reall want to think about. You can check if FTS4 is
PRAGMA compile_options. Also note that the porter
tokeniser is used because it provides much better stemming and matching.
So the next step is to actually determine our search infrastructure which is something like so:
API Gateway <-> Lambda Handler <-> S3 Bucket
The API gateway enables a domain and a series of routes which can be used to invoke the lambda function. The S3 bucket is where we upload the SQLite database, and where the lambda function pulls it down from. This setup work is cached by the lambda function since the same task might be invoked multiple times and we don’t have to pay those start up costs every single time the function is called. The next step is to define some cloudformation so that it can be quickly deployed without messing with a GUI. Remember that infrastructure as code is the way of the future. You’ll note that I use JSON over YAML for my cloudformation. I just find that JSON has a stronger formatting requirement and is less ambigious than YAML.
This basically defines our routes and tells cloudformation where to get everything that needs to be setup. You’ll take note of the CorsConfiguration defintion, it says what other origins are allowed to call this API, and actually get the results. You’ll see it contains my website, and where I test it at.
So let’s move on to the actual code within the lambda function. I’m using a python runtime because it’s what I know, the standard library is huge, and boto3 is already availabe in lambda. I’m not going to go too deep in the code, but here’s a pseudo outline:
Basically the lambda runtime invokes the
lambda_handler, as requests
come in and goes to sleep if there’s nothing to process. Anything
outside of the handler gets executed exactly once if it’s at the top
level. So at the beginning the SQLite database is downloaded from S3,
and it’s potentially kept open for multiple requests. A SQL query is
executed using the various FTS functions. For a better introduction on
that I recommend this blog post by Charles Leifer.
Finally, a simple search page is built using jekyll, that uses fetch to make a request and render the results to the page. Feel free to check it out at the Search Page, and the source is directly embedded into the page.
UPDATE 2021/07/10 TIL that a database can be packaged up with lambda saving a call to s3, making this search even cheaper to run. Now if this were for a multisite deployment I would probably keep s3 in there and have a database for each site, but I don’t. Just replace the CodeUri in the cloudformation with a path to a zip file containing your code and the database. It would matter more if I had more long form content, but I don’t and the size limit for lambda is very generous (50MB zipped). This is also faster, because there’s fewer network requests made.