I previously posted a lottery data visualization in which I attempted to correlate how often a certain number came up in the PA lottery given that some other number came up. I built the visualization using Python, SQLite, and Chart.JS on the front end. Since, I like sql I'll about the queries and methods used. These methods could probably be massively improved, but I did it in a couple of hours, because I was bored.
The underlying database is SQLite. There are 3 tables in the database.
The rules are as follows a game has many drawings which then has many numbers of different types drawn. As there are PA lottery games with things like the Powerball and some multiplier, we need to differentiate numbers based on type. However, we will only display the Pick N games, because I think they make the best looking charts.
Originally I attempted to do the entire prediction using SQL, but I couldn't get that to work quite right (I was getting double counts, I'm sure it's possible, but I couldn't get it). So I moved to using Python to do the aggregation.
So I actually ended up retrieving all the normal numbers for a game grouped by their drawing with a query like the below.
From there I processed all the numbers like so:
The gist of the code is as follows, build a series of counts for each drawing. Then create a matrix with a column for all possible predictors by index. From here I only count each number once, and require when the predictor is the same as the value, it's count must be greater than 1. This prevents double counts. From there I just wrap it up for Chart.JS to deal with it.
I also provide a histogram of the numbers pulled in a game with no groupings. That's powered by a query like so:
It's a pretty standard aggregation and histogram query.
The Front End
charting library. It's pretty trivial, I just tie the input fields
onchange events to reload the chart from a CGI script on the back end
(Yes I know CGI is archaic, but is there a better solution that doesn't
require running a daemon in the background or php. After all, I'm happy
with my shared host).
I also pull in the game details using a CGI script, so that I don't have to manually update the options or date ranges that I have available. I could change it to be regenerated, but that increases the complexity of my deploy process and removes the static nature of my site, and I would like it to stay that way.
Getting the Numbers
As for getting the numbers, I just scraped the PA Lottery's numbers RSS feed using the feedburner python module. This is the kind of stuff that needs to be public to prevent corruption, and I'm glad they still publish via RSS. Then I setup a daily scheduled event on my host to get the numbers and stick them in the database.
So I hope you found this article an informative look into my thought process when I build something. I hope you learned that you can still do things the old way and get good performance that is mostly hands off. Finally, I hope you take a look at my data visualization, I don't think it shows that much yet, but as we get more data I think we'll eventually get a fairly interesting picture.