🚨 Important: If you're in the US, remember to vote on Tuesday November 5th! 🗳️

🇺🇸 Also check your voter registration, some states have been purging voting rolls. 🇺🇸

Search Suggestions on My Website
2021 July 27

I previously talked about the search engine I recently added to my blog, and this is an extension to that previous post. Here I will discuss how I added a simple search suggestion option. It works for my website with the limited vocabulary and time to return results, but if I was working on a larger site I might try to make it smarter. So here’s how I implemented it.

SQLite’s full text search engine has a special virtual table called fts4aux that provides access to the underlying index and the words that the tokenizers found and parsed out. So all I did was add that table to the deployed database file and create a query that matched all words that begin with some prefix of the last term, and apply some ranking to pick out non-stop words. It then ranks the completions by the number of results for that potential completion. Which is apparently very cheap, because python is not being called during the ranking making this code much faster than would be expected. If you’re interested you can see the SQL below:

SELECT
    search_term,
    (SELECT count(*) FROM posts where posts match t.search_term) as num_results
FROM (
    SELECT 
        ? || ' ' || term as search_term 
    FROM posts_aux WHERE col = '*' AND term LIKE ?
    ORDER BY documents * occurrences ASC LIMIT 10 
) t
ORDER BY num_results DESC

I would be much happier if I could have used the spellfix1 extension to generate better potential results, but AWS lambda doesn’t support loading SQLite extensions for some reason. That would have allowed me to provide better alternatives for misspelled words, but it’s not like I can do anything without creating and maintaining a custom runtime.

There are also some issues with how the built-in porter stemmer interacts with the terms on my site. It tries to take the words back to their smallest root avoiding any kind past-tense or verbifiying. So you’ll get some weird combinations as suggestions like these:

Which is not necessarily a bad thing, but it is something to be aware of depending on the stemmer that is used. It expands the potential terms that can be related and helps with synonyms.

As for deployment it was simply deployed as another lambda function to the http api, with a small amount of memory since SQLite is so awesome at being memory efficient. I hope you found this mildly informative on what is possible with SQLite, and minimal python.


Remember you can also subscribe using RSS at the top of the page!

Share this on → Mastodon Twitter LinkedIn Reddit

A selected list of related posts that you might enjoy:

*****
Written by Henry J Schmale on 2021 July 27
Hit Counter