We’re open sourcing our live election night model

Emily Liu
Washington Post Engineering
6 min readSep 14, 2022

--

by Emily Liu and Lenny Bronner

We have model news again! We’ve spent the last six months re-implementing our election night model in Python and are excited to finally open source this version and share it with the community.

Our original election night model was written in R. While this was good as a first approach, it led to a few constraints around deployment, maintenance and flexibility. We decided it was time to merge our model with our existing infrastructure.

But what is the model?

At the Washington Post, we use our model to generate estimates of the number of outstanding votes on an election night. We also generate more granular estimates — for general elections, we estimate the partisan split of outstanding votes; for primaries, we split these estimates by candidate.

Our election model provides useful context on where the final vote may end up at the conclusion of the race. Raw vote totals are often misleading when presented independently, due to factors such as a rise in absentee and mail-in voting and that geographic areas report results at different rates (e.g., rural areas often report results earlier than urban areas). You may have seen the output of our model on our results pages as these fuzzy bars.

Example of our model visualization from November 2020

We estimate the turnout and vote total numbers by comparing live tallies to previous election results. We then extrapolate the difference between the live and historical results for geographic subunits (counties or precincts) that have yet to report results. The model is an implementation of quantile regression with demographics as features, and we use conformal prediction to quantify the associated uncertainty.

For the sake of this blog post, we’ll focus on the architecture decisions we made from an engineering perspective. For more information on how we calculate the estimates, you can read more here and read a graphic explainer of how election models work here. Further reading is linked at the end of this post.

From R to Python

In 2020, we built and open-sourced an election model that was written in R. The model was hosted on a server via Amazon EC2 that our election pipeline could access via an API. Unfortunately, R is not a very popular language on our team (sorry!) and we were severely constrained by the response time from pinging the model API. Additionally, it was not as fast, scalable, or interoperable with AWS as we wanted, and it was hard to share code written in R with the rest of our codebase. Similarly, it was also difficult to parse errors and debug whether errors came from the API or the model implementation itself, especially under a time crunch during an election. We wanted a better way to deploy the model.

Simplified diagram of our former pipeline

As a result, after the November 2021 general election, we began the process of rewriting this model from R to Python.

A lot of the model re-write simply meant translating code written in the R tidyverseto the equivalent in pandasand numpy. But one larger issue was the reliance of our R implementation on the quantregpackage for quantile regression. Unfortunately we couldn’t find a numerically stable Python implementation of quantile regression. (Numerical instability can lead to inconsistent results.)

As a result, we decided to write one ourselves, and we’ve open-sourced that too. You can find our implementation (as elex-solver) below. It uses the cvxpy optimization library. We hope you might find it useful as a stand-alone package if you need to do quantile regression in Python.

From a Hosted API to a Serverless Lambda

Now, the model is deployed as a Python package that we invoke via a serverless Lambda function.

Simplified diagram of our updated election pipeline with the model implemented as a Lambda

We deployed this model as a Python package because this makes it easy for collaborators both inside and outside of the company to install and use it. Our formatters from various election results vendors are Python packages as well, so this follows our existing architectural norm and simplifies our infrastructure.

When the model was formerly deployed as an API hosted on an EC2 instance, we pinged the API at a set rate of once every minute. However, we often ran into timeout issues. Invoking each model run as a serverless Lambda function solves this, because the Lambda maximum timeout is 15 minutes, which is well over our needs. Additionally, we have future plans to further align the model with our existing architecture by invoking the model only when new results come in instead of at a fixed rate. The architecture diagram above reflects this.

Importantly, we are able to run Lambdas independently of the rest of our pipeline (staging and production state machines). As a result, we preserve the flexibility of testing model runs with varying input parameters on election night, so we can tweak these parameters as we see fit as more information trickles in throughout the night. We provide ample logging per invocation of the model and use AWS CloudWatch to monitor the model throughout election nights.

This model is a continuous work in progress. You can browse the code yourself here. Installation instructions are included in the README.

Additional Reading

Over the course of the last two years, we have published multiple resources to share our progress. We will continue to share our work publicly, and we welcome feedback!

If you’re interested in this work and want to learn more, we recommend reading this starter guide on what our team had to learn about election data or reaching out to us.

Acknowledgements

This work would not be possible without the many, many people who have collaborated with us and supported us along the way. Thank you to the elections engineering team for their contributions to this project: Jeremy Bowers, Dana Cassidy, John Cherian, Holden Foreman, Dylan Freedman, Chloe Langston, Brittany Mayes, Anthony Pesce, Erik Reyna and Chris Zubak-Skees.

Thank you to Dylan Freedman for editing this post.

--

--