Washington Post Engineering - Medium

How and Why We Built “Ask The Post AI”

Jason Langsner — Thu, 28 Aug 2025 00:20:06 GMT

The Washington Post may be best known for our world-class and award-winning reporting, but we also pride ourselves on being industry leaders in technology and innovation. In addition to producing journalism, we also develop products and solutions in house to improve our users’ experience, including through our expanding AI Pod Research & Development (R&D) lab and newly launched WP Incubator.

One such product is “Ask The Post AI,” The Post’s in-house generative AI chatbot tool that uses Retrieval Augmented Generation (RAG) to answer user questions grounded in our journalism:

As AI tools matured fast, in 2022, we hosted an AI hackathon, in which we took the collective energies and ingenuity of our engineering and product teams — in partnership with our newsroom — across five days to rethink how we leverage AI and machine learning tools for media. That hackathon was a major success, leading to the creation of the AI Pod R&D lab and pointing us towards several exciting new experiments.

One of those experiments evolved into Ask The Post AI, The Post’s way of connecting readers to a library of our reporting. To test this feature, we started with focus. Before releasing Ask The Post AI across our entire newsroom, we released Climate Answers in partnership with The Post’s Climate Desk. This chatbot pulls from a Large Language Model (LLM) that answers readers’ questions based entirely on The Post’s extensive climate and environmental coverage. This was a success, but only a first step.

We scaled Climate Answers with the help of Meta’s open-source Llama LLM models to create Ask The Post AI. The tool takes questions from readers and collects relevant information from across our reporting library to answer them. In other words, rather than stringing together bits of information yourself from a variety of Post articles, Ask The Post AI does it for you. This is an alternative approach to needing to use Post search to find those articles and then get to an answer, the tool provides an answer with citations to the original journalism.

Ask The Post AI’s answers are as trustworthy as our award-winning journalism. You will find no random sources or unwarranted assumptions, as you might with other LLMs. Each answer is based only on a library of Post reporting from the last eight years. Feed Ask The Post AI a question — like “Will tariffs impact me?” — and you’ll get an answer back in seconds, based only on credibly sourced, fact-based journalism.

Of course, like many of our developing products, this is a work in progress — and like all LLMs, Ask The Post AI is improving through time and iterations. For transparency, each answer from Ask The Post AI is accompanied by links to the articles that served as sources, and we encourage readers to further verify responses by checking those articles.

Ask The Post AI continues to evolve and improve with every question our readers ask. User research and feedback allows us to improve and tailor how it responds. For instance, we work hard to establish when the LLM should make clear to readers that there isn’t sufficient Post coverage to offer a solid answer. Additionally, we’re working towards where and how Ask The Post is contextually available — such as recent experimentation to add the tool on some articles. And as we iterate for external needs, the AI Pod R&D lab supports internal customers — leaning into our own tooling as creators and as staff users before we make features available externally to Post subscribers and users.

The Washington Post LLM — an internal R&D tool

Although Ask The Post AI is our main reader-oriented AI tool, it is only one of several that our team is working on — and the team is getting bigger. The Post just named its first Chief AI Officer, Sam Han, who will guide us to innovate and expand within a strict ethical and journalistic framework. Sam will also support WP Incubator, modeled after successful incubators in Silicon Valley in design to develop cutting-edge products. .

Try Ask The Post AI to interact with the news or head to washingtonpost.com to play our games, listen to our podcasts, watch our videos, and — of course — read our award-winning journalism.

How and Why We Built “Ask The Post AI” was originally published in Washington Post Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

How the Washington Post Uses Smart Metering to Target Paywalls

Janith Weerasinghe — Tue, 10 Dec 2024 21:04:42 GMT

by Janith Weerasinghe

Introduction

The Washington Post, like most subscription-based newspapers, employs paywalls to nudge users to purchase subscriptions. We want to ensure that we allow our non-subscribers to sample enough content so that they will be more likely to purchase a subscription the next time they see a paywall. We also want to encourage users to engage with us, consume our content frequently, and build habits. However, we also want to ensure that we do not relax our content access policy too much, causing a negative impact on subscriptions.

One lever we can use to control the subscriptions and engagement across our products is our “metering policy.” This policy controls the level of content access we give to non-subscribers. When a non-subscriber attempts to access an article, it determines whether to let the user read the article or to display some type of wall. This policy is influenced by several factors, including our business goals (in terms of engagement and subscriptions) and editorial policies.

In this blog post, we present how we migrated from a rule-based metering policy to an automated “smart metering” policy, which resulted in a significant improvement in key metrics and also reduced the manual overhead needed to maintain metering policies.

Some Background

User types and the subscription funnel

Let’s start by defining some terms and concepts we will use throughout this blog post. We have three types of users:

Anonymous users: are users who have not created an account with us. The only information we have about these users are their interactions with the Washington Post that are tied to their cookies.
Registered users: are users who have created an account and have signed in. These users do not have an active subscription. We have slightly more information about these users including their reading history across different devices. They can receive newsletters and other emails that are aimed at nurturing engagement. Therefore, having a large pool of registered users will be beneficial in generating new subscriptions in the future.
Subscribers: These are paying subscribers who get access to all the Washington Post content without any limits.

Our smart metering model only focuses on anonymous and registered users.

(a) An article with no walls (a pageview), (b) An article with a Multi-offer wall prompting a user to register or subscribe, (c) A paywall prompting a user to purchase a subscription

A good metering policy should find the right balance between engagement, lead generation (user registrations), and subscriptions. As shown in the figure above, when a non-subscriber is attempting to access an article, we have three choices:

We can let the user read the article. This will increase user engagement and habit building. With enough chances to explore our content, we hope that this user will eventually become a registered and/or subscribed user.
We can show a multi-offer wall (MOW). When presented with a multi-offer wall, the user has several options: they can purchase a subscription or they can create an account with us in order to access the article. The primary goal of a MOW is to generate leads.
We can show a paywall. When presented with a paywall, a user can purchase a subscription from a few different subscription options.

As you can see, choosing which action to present to a user at each article access attempt will directly impact our three goals.

Previously, we approached this problem by hand-curating a set of rules that were updated regularly with A/B testing. Domain experts designed rules and tested them out. This approach is time-consuming and resource-intensive. Furthermore, it is hard to manually create fine-grained rules that can target users and content in a granular manner. These rules must also be tuned to meet the dynamic pace of news, special events such as sales, and business priorities.

Therefore, we wanted to employ a model to decide, under certain guardrails, which action to take when a user attempts to access an article. There are a few different ways of approaching this problem:

Build separate “propensity models” that will predict a propensity score for each action. For example, three models could predict a user’s likelihood to engage, register, and subscribe. Then stakeholders will create a set of rules to determine the different thresholds at which different metering actions should be made.
Train a reinforcement learning model that assigns metering actions to optimize for some long-term reward. We can use a combination of content and user signals, and the signals from propensity models as input to the model.

In order to minimize the need for hand-crafted rules and setting multiple thresholds, we opted to use a Reinforcement Learning (RL) approach. A well-trained model with an appropriate rewarding strategy should be able to assign metering actions to maximize long term rewards.

Data Collection

To use any machine learning approach, we must gather a dataset showing users’ behavior patterns under different metering policies. To this end, we ran a randomized control trial (RCT) in which we randomly assigned different metering actions when a user requested to read an article. For an anonymous user, we could assign three possible actions: let the user read the article, display a multi-offer wall, or show a paywall. Similarly, for a registered user, we could let the user read the article or show a paywall. During this RCT, for each article access attempt, we randomly chose a metering action among the eligible actions.

The resulting dataset gave us a treasure trove of data indicating how likely a user is to register or subscribe at different points in their journey. It allowed us to slice and dice the data across different dimensions to answer questions like, “Are users more likely to subscribe after reading three articles or four?”

With the RCT data, we were able to create granular user segments and analyze user behavior under various metering actions. It became clear that, in order to improve our metering policy, we would have to make metering decisions at a more granular level rather than a one-size-fits-all metering policy. This observation emphasized the need for a model that can learn a more fine-grained metering policy.

Metering as a Reinforcement Learning (RL) Problem

Before moving on to how we approached metering as a RL problem, we will briefly introduce reinforcement learning.

RL algorithms are a class of machine learning algorithms that specifies how an agent, operating in a dynamic environment, takes different actions to maximize a reward. Unlike supervised machine learning algorithms, RL algorithms assume that the environment will change as a result of the agent’s actions. Furthermore, the objective of RL algorithms is to learn a policy that maximizes the cumulative rewards over sequences of actions (called episodes). A RL policy determines which action to take at any given state. RL has been successfully used in a number of domains including in robotics (locomotion, grasping, navigation), in gaming (to automatically play games like Atari from video input, and games like Go), and in self-driving cars.

Metering as a Reinforcement Learning System

We noticed that our metering problem fits neatly into the reinforcement learning paradigm. Here, the agent would be our metering agent. The environment is the state that describes the user and the content that they are trying to access. Given this state, the agent can take up to three actions, and after an action is executed, the agent will receive a reward based on the user’s reaction. Each episode would be an individual user’s sequence of article accesses until they become a subscriber or until we see no further access attempts from the user (as shown in the figure below). The agent’s goal would be to assign actions at each step of a user’s journey such that it maximizes the cumulative reward of each episode.

Two example episodes

Our reward policy assigns a reward to each metering action based on the corresponding user reaction. For example, if the agent decides to show an article (which leads the user to engage with the article), a certain reward is added. Similarly, if a paywall is shown and the user subscribes, a (much larger) reward is assigned. Defining the right reward policy is a crucial step to ensure that the metering agent’s optimizations align with our business needs. Therefore, we worked with our business stakeholders to define our reward structure carefully.

How the RL Algorithm Works

After formalizing metering decision-making as a RL problem, our next step was to determine the type of RL algorithm we want to implement.

Most reinforcement learning algorithms are designed for “online” settings, in which the agent can actively explore the environment and receive instantaneous feedback from the environment to improve the policy iteratively. While online RL has been very successful in domains such as games and virtual environments, where the agent can be allowed to explore the environment without any risk, in other settings, online data collection is impractical because it is costly and/or risky. Furthermore, most online RL algorithms are unable to use previously collected data. Therefore, any changes to the model would require retraining from scratch.

In our setting, it would be costly for us to implement the infrastructure to figure out the appropriate reward based on user actions immediately as the agent makes a prediction. It is also risky for us to let an agent explore the state space on live production traffic at a large scale. We also wanted to be able to iteratively improve and change our models easily, without having to retrain models in a live environment. Furthermore, we wanted to be able to use the rich dataset we gathered from our previous RCT.

As a result of these requirements and constraints, we decided to look at offline RL algorithms. Offline RL algorithms learn policies from a static dataset without needing to interact with an environment. One downside of this approach is that most online reinforcement learning algorithms are usually able to find the optimal policies faster and can find better optimal policies.

Most online and offline RL algorithms learn an action-value function, usually referred to as a Q-function. The Q function, Q(s, a), predicts the expected cumulative reward that we can achieve if we started from state s, took action a, and then followed the current policy. The Q-function can be defined recursively using the Bellman equation as follows:

Where:

r(s, a) is the immediate reward that is gained by being in state s and taking action a
γ is the discount factor which determines how much we want to favor early rewards as opposed to later rewards
s’ is the next state reached after taking action a from state s
The term maxₐQ(s’, a’) is the maximum reward that can be achieved from being in state s’

Once we have a learned Q-function the agent operates at prediction time as follows: given a state that represents the user and the content that they are trying to access, apply the Q-function on all possible actions that can be taken from the given state and output the action with the highest Q value.

Both online and offline are trained on state transitions. Each transition contains a particular state s, the action taken a, the resulting state s’, and the reward r(s, a). In offline RL, we assume the data is static and generated from another process (an older model, a manual rule based system, etc).

First approach: A Q-Table with Q Value Iteration

In our first version of the smart metering model, we opted for a simpler model with a limited state space and trained a model using Q value iteration. Here, the Q-function is simply represented by a table (a Q-table) with a value for each state-action pair. Then we used the training data to iteratively update the Q table as follows, where α is the learning rate:

This simpler model allowed us to deploy a smart metering solution faster and it significantly outperformed our previous rule-based approach on key metrics. Having this simpler model also allowed us to iterate and improve our reward strategy.

While the Q-table-based approach was easier and faster to implement, it had a few downsides. Primarily, because the state-action values are stored as a table, adding new signals to the state space causes this table to grow exponentially. Furthermore, for the learning to be effective, our training dataset needs to have a fair coverage of data points for all unique state-action pairs. Unlike a neural network, a table-based approach cannot generalize using data from other state-action pairs in the state-action-space. We are also limited to using discrete features in our state space since the table-based approach cannot handle continuous values.

Another common issue that arises when training an offline RL model is that due to the inability to explore, the learning algorithms run the risk of overestimating the values of states that are not well represented in training data. Online RL avoids this issue because it allows those spaces to be explored and values to be updated. However, the risk of this occurring in our table-based approach was low because we ensured our state space was small and had good coverage.

Second Approach: Deep Reinforcement Learning with CQL

After we were confident with our initial table-based model and the reward function we used, we moved away from the table-based approach to a neural network-based approach in order to support a more complicated state space. We used Conservative Q Learning (CQL)¹, a deep reinforcement learning (DRL) method, to train a neural network to estimate the state-action values. CQL mitigates the overestimation of Q values of out-of-distribution state-actions by regularizing and ensuring that the learned policy does not deviate too much from the dataset.

With CQL, we were able to incorporate more features into our state space and our tests showed that CQL was able to learn better policies than the table-based approach.

Evaluating Models

While we use A/B testing to validate our new models and ensure we are comfortable with their performance before they are fully adopted, we also wanted a way to test our models offline, before even running an A/B test. This allows us to train multiple models, evaluate them offline, and pick a few candidate models for A/B testing.

Unlike supervised machine learning, where offline testing is straightforward and standard practices (creating train/validation/test datasets) and metrics (such as precision and recall) exist, evaluating RL models offline is difficult. This is because the actions from RL models change the environment, and we do not know what a user’s reaction to an action would be or their next actions. For example, our static datasets may contain an episode where a user was shown a multi-offer wall after reading three articles and registering. However, the trained model may predict that showing a paywall at this instance would be the best course of action. Because we do not know how this user would have reacted, we have to make estimations with the existing static data we have.

We used two approaches to evaluate the performance of our models offline. Both approaches provide an estimate of the average cumulative reward that can be expected from all the episodes:

Importance Sampling: We can estimate what the cumulative reward of an episode from our static dataset would have been under a new model by computing the following:

The probabilities of episodes can be computed as the product of each state-action pair in the episode. One downside of computing importance samples as above is that if the probability of a certain state-action pair is very low in the dataset but very high in the new model, the reward values will be massively over-estimated. To avoid this, we used the approach suggested by Thomas et al.² to come up with a lower bound with a confidence interval for the predicted rewards.

Fitted Q Evaluation (FQE)³: FQE iteratively learns another Q-function based on a given trained policy. The trained Q-function in FQE produces a more accurate estimation of the learned Q-function than the learned Q-function itself. We can then compute the average Q value from all the initial states we see in our training dataset to arrive at an estimate of the average cumulative rewards.

In our experiments we found that both FQE and importance sampling results largely matched each other.

Conclusion

In this blog post, we discussed how we approached implementing a smart metering approach to replace our previous rule-based metering strategy. After deploying the smart metering approach, we were able to make significant improvements in key metrics, and have noticed that the smart metering model can target walls with much greater accuracy. Apart from the improvements made to the conversion metrics, using an automated approach helped cut down on the significant overhead that was required to maintain manually curated metering rules. While we use machine learning to make metering decisions, we have a number of guardrails, including business rules and editorial decisions, that apply on top of the smart metering decisions. We also ensure that we only use first-party data to train our models and that we adhere to our AI Policy when using machine learning in our products.

Acknowledgements

The author would like to thank everyone who worked on this project, including James Nance, Meng Ling, Himanshu Jahagirdar, Sophie Katasi, Shuguang Wang, Sam Han, our Subscriptions Engineering team, and our Analytics team.

References

Kumar, Aviral, et al. “Conservative q-learning for offline reinforcement learning.” Advances in Neural Information Processing Systems 33 (2020): 1179–1191.
Thomas, Philip, Georgios Theocharous, and Mohammad Ghavamzadeh. “High-confidence off-policy evaluation.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 29. No. 1. 2015.
Le, Hoang, Cameron Voloshin, and Yisong Yue. “Batch policy learning under constraints.” International Conference on Machine Learning. PMLR, 2019.

How the Washington Post Uses Smart Metering to Target Paywalls was originally published in Washington Post Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Analyzing Voter Behavior Patterns: Insights from The Washington Post

Diane M. Napolitano — Mon, 01 Apr 2024 20:56:45 GMT

by Diane M. Napolitano and Lenny Bronner

With any election, we might be interested in not only who won, but how. Who are the people who cast votes for any one candidate? Are they young? Do they live in small towns or big cities? And, maybe most interestingly, who did they vote for in previous elections?

Exit Polls often provide us with answers to these questions. Pollsters wait outside polling locations or call early voters at home and ask a (hopefully) representative sample of voters about themselves and who they voted for. For example, when pollsters asked voters at the New Hampshire Democratic Primary in 2020 both who they voted for in the current election, as well as who they voted for in the 2016 Primary, they responded:

Source: Edison Research via CNN

According to these results, roughly 13% of voters who voted for Hillary Clinton in the 2016 Democratic Primary voted for Joe Biden in 2020, approximately 25% of the same pool of voters voted for Pete Buttigieg, and so on. We can also look at these results visually:

But what if this Exit Poll wasn’t conducted? Or what if it was, but this particular question wasn’t asked? Or, what if we have reasons to believe that the Exit Polls might not be capturing the full story? Ideally, we’d look at individual voters and count how many of them voted for any given candidate in each election, but access to individual-level data such as this would violate the right to a secret ballot. (When elections are conducted by Caucus, individual-level votes aren’t even recorded — the final tallies are taken by counting heads or hands.)

Here at The Washington Post, we have developed models that infer what we call Voterflow, patterns in the way votes “flow” from one election to the next between candidates. These kinds of models are quite common in multi-party electoral systems, but haven’t seen much use in the United States so far.

Let’s think through how this might work:

Imagine you voted in the 2020 New Hampshire Democratic Primary. There are a number of possibilities for what you did in the 2016 primary: maybe you voted for Hillary Clinton, maybe you voted for Bernie Sanders, or maybe you voted for neither (because you were not old enough for voting, you didn’t live in New Hampshire at the time, or maybe you just simply didn’t vote). In fact, we know that one of these options had to have applied to you in 2016. That means that the total number of votes that each candidate received in the 2020 primary, tⱼ (Joe Biden, Pete Buttigieg, Bernie Sanders, etc.), can be thought of as a weighted sum of the votes received by all 2016 primary candidates:

We can interpret βᵢⱼ as the fraction of Hillary Clinton voters that voted for candidate j in 2020. This means if we can estimate βᵢⱼ, we have an answer to our question.

To estimate the parameters of interest, let’s assume that βᵢⱼ is constant over an entire state. Then for every geographic unit 1,… ,g that we have results for:

We can see that we need election results in order to do this. Ideally we obtain these results in the smallest geographic unit in that state into which voters are organized, as close to individual voters as possible. In New Hampshire, this happens to be individual towns and cities, but in most states, these are precincts. You can read more about geography, precincts, and how these relate to the way election results are tallied and reported here.

It’s important that we use data from the smallest geographic unit available so that we reduce the extent to which we commit the ecological fallacy. This fallacy occurs when we make assumptions or inferences about every individual within a group based on aggregate behavior. More information on, and examples of the ecological fallacy can be found in Freedman (1999, below) and also here.

How does this apply to our Voterflow models? In New Hampshire, in 2016, Bernie Sanders won the Democratic Primary with about 60% of the vote, and also won again in 2020, but this time with only 26% of the vote. To assume that entire 26% came from the 2016 pool of Sanders voters would be committing the ecological fallacy: in 2020, some voters voted differently, or not at all, while others became Sanders supporters. Our task is to estimate how many voters fall into each of these groups, and using data from the smallest geographic units available gets us as close to understanding individual voter behavior as possible.

To do this, we need a data set reflecting the election we want to see votes flow from (F):

Source: AP

And a corresponding data set with the same geographic units reflecting the election we want to see votes flow to (T):

Source: AP

Using these results, we can now re-write our equations from above as:

T = F ·β

and it becomes clear that if we want to estimate β we can use our favorite optimization library.

We also need data on the number of registered voters who were able to participate in the same election as our T data set (in this case, registered voters in New Hampshire by the time of the 2020 Democratic Primary):

Source: New Hampshire Secretary of State

We’ll use this to weight each geographic unit by the number of registered voters. This helps our model learn the importance of larger geographic units, which are otherwise represented equally with only one row in F and T.

Now that we have our data, our next step is to normalize it. We’ll convert both F and T from raw vote counts to percentages by summing each row and dividing the number of votes received by candidate i in each row by this sum.

We’ll also normalize the number of registered voters (RV) in every geographic unit u such that:

Let’s refer to the entire set of normalized RVᵤ values as W, our model’s weights.

Now, our normalized F, T, and W are going to be used to solve a matrix regression. We set the following constraints:

β >= 0 and β <= 1 will ensure that our inferences stay in the non-negative percentage space, same as our Exit Poll; but also,
∑βᵢ = 1 will guarantee that, like the rows in our Exit Poll data, the inferred βᵢⱼ terms will sum to 1 for every candidate in our F data set; in this case, every candidate in the 2016 New Hampshire Democratic Primary. The intuition here is that that every person who voted in 2020 needs to have done something in 2016: either they voted for one of the candidates or they didn’t vote at all.

And we solve the following loss function subject to these constraints:

You can find our Python implementation of this Transition Matrix Solver in our elex-solver package.

We now have a matrix of β terms that contain a solution similar to our Exit Poll example: for every candidate i in our F data set, what percentage of their vote went to candidate j in our T data set? Or, to follow our example, what percentage of Hillary Clinton’s, Bernie Sanders’s, and the other 2016 candidates’ votes in the New Hampshire Democratic Primary went to each of Joe Biden, Pete Buttigieg, Bernie Sanders, and so on, in the 2020 primary?

But wait! This is just one set of voter-flows for an entire state. Is it reasonable to believe that the same fraction of Hillary Clinton voters went to Pete Buttigieg in every single part of New Hampshire? Should we not allow our model some flexibility?

We know that different areas within each state tend to vote similarly, for instance:

These three maps illustrate some interesting properties about New Hampshire:

Most of the state is rural, but becomes more “suburban-y” in the southeast. This happens to be where New Hampshire’s capital is located (Concord) as well as its largest cities (Manchester and Nashua).
The first map, showing the percent of registered voters per township that voted for Bernie Sanders in 2020, is almost the inverse of the second map, showing the percent of residents per township that have at least a Bachelor’s degree.

In fact, voting for Bernie Sanders in 2020 and having at least a Bachelor’s degree in New Hampshire has a Pearson correlation (ρ) of -0.51, indicating a moderately inverse relationship: in 2020, places with a larger proportion of formally higher educated voters gave less of their support to Bernie Sanders. We can even look at this relationship by County Classification:

In other words, voters in more rural parts of New Hampshire with at least a Bachelor’s degree were more likely to vote for Bernie Sanders in 2020 than their equivalents in more suburban parts of the state. If it’s true that voters in areas with more formal education are less likely to vote for Bernie Sanders, it might also be true that these voters have a different propensity to choose Bernie Sanders or Pete Buttigieg in the next election.

We can use this to improve our Voterflow model by stratifying on each of these three pieces of information. Each geographical unit in our data falls into one stratum s for a given strata S, where S is either categorical (Rural townships, Suburban townships) or continuous (percent of registered voters per township that voted for Bernie Sanders in 2020, percent of residents per township that have at least a Bachelor’s degree), in which case we’ll create s by dividing S into quantiles. We’ll solve our weighted matrix regression separately for each s, combine each of these into a single solution βₛ for each S, and take the mean across all βₛ.

As a result, instead of assuming the same percentage of voters that picked both Bernie Sanders in 2016 and Joe Biden in 2020 across the entire state, our model is now only assuming that this is true within each stratum. A more reasonable assumption.

Here’s what our final weighted stratified model’s output looks like:

How do we make sure that our model’s output is reasonable? Ideally, our output and Exit Polls should be telling the same story since they are trying to answer the same question. To be sure, we can review our Exit Poll:

The output from our model, our inferred voter-flows:

And examine the difference between them:

This tells us that, for instance, our model’s inferences for the percentage of 2016 Bernie Sanders voters who voted for:

Joe Biden in 2020 are lower than those in the Exit Poll by 2.6%;
an “other” candidate in 2020 are higher than those in the Exit Poll by 23%!

How could this difference be so large?

This Exit Poll only includes 952 respondents. By contrast, our model uses aggregate data from 236 townships comprised of 712,216 registered voters in 2020. To compensate for this and other ways in which this survey could be off, including sampling variability, we’ll compute a 95% confidence interval around the percentage every candidate pair received in the Exit Poll. This way we can ensure that if the same Exit Poll were conducted multiple times, the true value of the population surveyed would be captured within this confidence interval 95% of the time.

We’ll use this 95% confidence interval along with the survey question’s answers themselves to measure the extent to which our inferences tell the same story as the Exit Poll. The Mean Absolute Error (MAE) over all candidate pairs is 8.5%, and if we look at the upper and lower bounds of the confidence interval we get a range of MAEs of 8.5–8.8%.

Ideally, MAE should be as close to zero as possible, so why isn’t it here? This is likely because:

The “Neither (2016)” option includes not only voters who voted for a candidate other than Clinton and Sanders in 2016, but also non-voters. Our data contains more information on non-voters than the pollsters were able to gather, probably because most of the 2016 non-voters weren’t at their poll site in 2020 — most non-voters tend to remain non-voters.
Given the small sample size, it’s likely that pollsters interviewed more voters who voted for one of the major candidates than for a candidate that falls under the “Neither” or “Other” options. Not only that, but between elections, people tend to misremember if they voted and who they voted for, mistakenly thinking they voted for one of the major candidates.

This isn’t the only election and Exit Poll combination we’ve used to evaluate this modeling approach. We also use the 2020 Democratic Primary VoteCast survey produced by the Associated Press and the National Opinion Research Center at the University of Chicago (AP-NORC). This survey provides responses to a question similar to the one in our Exit Poll example for 17 states.

Unfortunately, we aren’t able to run our stratified model on all 17 states due to a lack of available precinct-level data and an insufficient number of counties, cities, and towns, a.k.a. Reporting Units (RUs). When measuring MAE, we’ll take this into account and compare what happens when we don’t use precinct-level data.

As you can see, using a large number of geographic units that are as close as possible to individual voters makes a big difference:

Averaged across all states, MAE compared to the mean AP-NORC VoteCast survey response ranged from 3.8% to 7.0% when precinct-level data was used and 6.0% to 7.9% when RU-level data was used.

The bottom plot shows the relationship between the number of geographic units and MAE with a regression line: in general, as the number of small geographic units increases, MAE decreases. Our model performs best on states with a large number of precincts, data as close to the individual voter as possible.

It’s also worth noting here that every state’s strata are slightly different. In our 2020 New Hampshire Democratic Primary example, we selected three strata based on domain knowledge. In general, we typically combine this with the measured variance of every possible strata we could use. For instance, per state, we’ll query either the American Community Survey or the L2 Voterfile for all available demographic data relating to education-level, select the one with the largest variance, and repeat this for other demographic categories such as age or ethnicity. Our use of variance here helps us identify strata that combat the ecological fallacy — a high variance tells us these variables cover a wide spread of geographic units. Whether or not this is the absolute-best approach remains an open question.

Regardless, so far we’ve used our Voterflow modeling approach to analyze two elections in the 2024 Republican Primary/Caucus cycle:

See how Trump more than doubled his support in Iowa since 2016 by Lenny Bronner, Derek Hawkins, Luis Melgar, and Diane Napolitano
See how Trump pulled support from his 2016 rivals to fuel New Hampshire win also by Lenny Bronner, Derek Hawkins, Luis Melgar, and Diane Napolitano

Related Work

When developing our weighted stratified Voterflow model, we drew on prior work from:

the SORA Institute described as in:
· Hofinger, C., & Ogris, G. (2002). Orakel der Neuzeit: was leisten Wahlbörsen, Wählerstromanalysen und Wahltagshochrechnungen? Österreichische Zeitschrift Für Politikwissenschaft, 31(2), 143–158. https://nbn-resolving.org/urn:nbn:de:0168-ssoar-59937
David Freedman’s work on ecological regression and the ecological fallacy, specifically:
· Freedman, D. A. (1999). Ecological Inference and the Ecological Fallacy. International Encyclopedia of the Social & Behavioral Sciences, 6(4027–4030), 1–7. https://statistics.berkeley.edu/sites/default/files/tech-reports/549.pdf
· Freedman, D. A., Klein, S. P., Ostland, M., & Roberts, M. R. (1998). Review: A Solution to the Ecological Inference Problem. Journal of the American Statistical Association, 93(444), 1518–1522. http://www.jstor.org/stable/2670067

As we continue to refine our approach, we’re also looking into Bayesian solutions to the ecological inference/regression problem, including:

Rosen, O., Jiang, W., King, G., & Tanner, M. A. (2001). Bayesian and Frequentist Inference for Ecological Inference: The RxC Case. Statistica Neerlandica, 55, 134–156. https://gking.harvard.edu/files/em.pdf

Acknowledgements

We are grateful for the feedback and support from our colleagues at The Post and our external collaborators, including John Campbell, John Cherian, Scott Clement, Reuben Fischer-Baum, Sarah Frostenson, Sam Han, Derek Hawkins, Luis Melgar, Ashlyn Still, Rachel Van Dongen, and Shuguang Wang.

Analyzing Voter Behavior Patterns: Insights from The Washington Post was originally published in Washington Post Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Rethinking Content Insights at The Post

Jason Langsner — Thu, 28 Mar 2024 18:04:12 GMT

A Washington Post Senior Data Engineer speaks about GRAHAM and Article 360 in a Tech Demo Day.

The Washington Post makes countless data-informed decisions daily. As the lights never turn off in our newsroom and content is produced through 24/7/365 global operations — we have a lot to measure. And we have a lot of opportunities to turn those measurements into insights that inform.

What was the problem?

Post stakeholders — whether in the newsroom or on our business side — were lacking a centralized source of truth for article analytics that were aggregated across all platforms. They needed a fully self service system that was easily accessible to both technical teams and business users.

While we had built tools over the years, which had article data, none of them were comprehensive enough to meet the needs of ALL stakeholders at scale. A gap in a holistic approach also meant that the Data and Analytics teams spent a lot of time building one-off self-service tools and dashboards for reporters, editors, leadership, and business stakeholders to see or get alerted to compelling content insights. These tools worked as targeted and data analysts closed the gap, but the scale was off. It was time for The Post to zoom out: looking at the whole forest and not just specific trees. It was time to rethink content insights at The Post.

Work was getting done. But there was duplication across these workstreams and both news and business leaders wanted to streamline and centralize article analytics.

Team members needed a way to quickly look up lifetime performance data of all of our content — including article metadata, performance data, conversions data, revenue data, video metrics, and news topic level insights that aggregate the “aboutness” of content (e.g. all articles related to “Artificial intelligence”) across Post. But they also needed to look at different time-based granularities for these metrics. They needed it in one place. The data had to be sourced and normalized across all the platforms where our readers consume content, including off-platform, like on Apple News; or what query terms in Google led to referral track from search.

Data needed to be accessible, highly QAed, well defined, and on-demand.

What did we do?

How data flows through A360 to GRAHAM and other consumer applications

The Post’s WaPo 360 and Analytics teams began to evaluate approaches to centralize all article insights into a new service called Article 360 (A360). Discovery kicked-off. And lessons learned from the team’s very successful implementation of Customer 360 (C360), a centralized data warehouse that stitches together hundreds of key reader insights to provide faster analytics, were used and applied in developing a complementary service for articles — thus the inspiration of A360 — for content insights.

Net-net, we consolidated four tools into one and we deprecated two other analytics tools. This Post retooling changed how data-informed decisions about content are made. The User Experience is cleaner and perhaps equally as important, the new A360 data service approach rethinks earlier work streams by centralizing over 100 unique content signals on every article that our talented authors write and our smart editors publish — bringing together five separate data sources and other internal Post metadata into a new home that provides a 360 degree view for content insights.

We are building something once to use it 100 times versus building 100 things to use them each once.

On day 1 of A360, the WaPo 360 and Analytics teams exposed article insights in daily, weekly, monthly, and lifetime granularities. Users and Pageviews were grouped across geographies (DC Metro area, the broader U.S., and International). Insights were broken out by devices (desktop, mobile, and app), audience referral channels (email, search, social, etc.), and were split out between user types — subscribers and non-subscribers. Based on the user feedback, we additionally took views from Post content getting republished on Apple News and made that available side-by-side to how the same content performed on washingtonpost.com and our mobile app.

Do the math: 100 unique signals, available in four different time-based metrics, accessible by all readers or those in three specific geographic regions, by customer status, and by device type for every single piece of content produced daily (plus going back into our library for years). That’s a prism of content insights for newsroom, subscription, and ads decision makers. And as a data service, data engineers made sure to make all critical signals accessible via API for other Post applications to call by other engineering and product teams.

Data is sourced from Google Analytics, Adobe Omniture (The Post’s previous analytics tool), Google Ad Manager, Google Search Console, and Apple News.

That data is then augmented by internal Post systems. But Product and Engineering did not stop there as centralized data sitting in a data warehouse, only accessible for technical users, would only solve part of the problem.

Why was it innovative?

The Washington Post is both a media and a technology company. Our Product and Engineering teams paired up with thought leaders in the newsroom and elsewhere to develop the architecture and build out A360. The Post’s Product Design team ran a series of user interviews to inform the work. And that led us to build a new centralized home for article and topic insights: GRAHAM.

While the benefits of centralizing concrete data assets and defining them to act as the source of truths is not new, the fundamentals get lost very frequently. AWS notes the benefits of a data warehouse as:

Informed decision making
Consolidated data from many sources
Historical data analysis
Data quality, consistency, and accuracy
Separation of analytics processing from transactional databases, which improves performance of both systems.

These were our north stars in this project with Article 360 (zooming into our content) and Customer 360 (zooming into our readers) — the bread and butter of a media company — feeding into GRAHAM.

Some of what was built simply couldn’t have been done in our legacy architecture — for instance it is 2024: a U.S. Presidential Election year. To compare election results of previous Presidential Elections, the Analytics team would get section and subsection content from Google Analytics in 2020 and earlier election data from 2016 from Omniture. Data would need to be normalized on the fly and the inventory of available signals would be an apples-to-oranges comparison. Additionally, as primary and general election reporting span over a year, there would also need to be further analysis and remediation of data spanning across years in our Google Analytics flat export. This created a lot of friction and slowed down insights. Now all data is normalized, centralized, and perhaps most-important-of-all: it is pre-processed and pre-aggregated so there isn’t a need to join different tables with SQL and wait, and wait, and wait, for queries to run and analysts to come up with answers.

The A360 approach uses a modularization pattern with the Spark DataFrame API. It allows us to experiment with new data groupings and create new metrics on the fly when ideas come from stakeholders. And it allows GRAHAM pageload time to be very fast.

The updated data engineering approach using Spark Execution Engine and Elastic Map Reduce on AWS has led to some huge performance gains. Processes that previously would take hours or days can now be done in minutes. And some processes that just weren’t comprehensible before can be done in seconds now. This results in performance improvements, time saved, but also far less computational credits getting billed — so insights aren’t just faster, they actually cost less to generate in direct people-hours and processing costs.

We took the principled approach of meeting the data consumers where they were.

Article 360 data brings insights to where different teams operate — with technical teams accessing data in S3, Parquet files, Redshift, and directly in the data warehouse. Technical teams are leveraging these data service assets to create their own applications — sourcing data from our newly formed Data + AI team; which consolidated WaPo 360 and three other engineering teams into one team that harnesses data, AI, and ML to drive growth and personalization.

For business and newsroom users, data that was previously available in separate reports are now one-click away in GRAHAM — a web-based REACT app that pays homage to our newspaper’s history and our future:

The Graham family played an important part in who we are as an institution. And our Product team was proud to pay homage to them, with permission from the family, in surfacing A360 data in GRAHAM, which is an acronym for Graphical Reporting of Article Health and Analytics Metrics.

GRAHAM as a tool and A360 as a centralized data service have become our source of truth.

GRAHAM delivers over 100 pre-processed, pre-aggregated, and thoroughly QAed article signals. A user management system in GRAHAM also gates what different team members should see based on their responsibilities. Stakeholders have found it transformative. Users can easily request or create new signals and A360 can easily scale and make them available in GRAHAM.

The tools and services also simultaneously remove strain on querying Google Analytics’ raw files.

WaPo 360 are big believers that technology alone will fail. By following a process that puts people and process first and second — we made sure to add definitions for all of the data signals in a user-friendly data dictionary and in Swagger Docs for accessing A360 data via a GRAHAM API for technical users across The Post.

Conclusion

Article 360 unlocks insights and unlocks potential, but data sitting in a table collecting dust does nobody any good. The Post moved quickly to enhance the democratization of our content data — improving how analysts and business users consume information in GRAHAM. Adoption is on-target and with rolling out the tool in 2024, we’ve aligned our roadmap to user-feedback. We have plenty of more innovation to come, but we’re excited for the change realized to-date and the change that will be coming following a product roadshow event for our newsroom.

If content and customers are the bread and butter of a media company, The Post is now in a much better place to serve some gourmet sandwiches. We broke down stove-pipes, consolidated tooling, and improved workstreams. Early feedback has been highly positive and speaking of sandwiches, we look forward to supporting many data hungry stakeholders through more automation and more self-servicing.

It was The Post’s time to rethink former approaches to content insights. Our answers may remain the same from when we were finding them in far more manual ways, but through stitching content signals together and making smart repeatable cuts of that data occur through automation, we can rededicate time, and work smarter rather than harder.

A screenshot from GRAHAM from an article published in January 2024. Performance data blurred for publishing in The WashPost Engineering Blog.

Read earlier WaPo 360 articles in the WashPost Engineering Blog:

April 2023: Personalized subscriber onboarding results in higher engagement, retention for Washington Post (republished from the International News Media Association)
July 19, 2022: Developing a New WashPost Content Taxonomy System
March 8, 2022: Doubling down on personalizing the reader experience

Rethinking Content Insights at The Post was originally published in Washington Post Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Personalized subscriber onboarding results in higher engagement, retention for Washington Post

Jason Langsner — Fri, 30 Jun 2023 18:37:42 GMT

Note: The WashPost Engineering Blog is republishing this blog post from Anjali Iyer, Head of Lifecycle Marketing at The Washington Post, in The International News Media Association. WaPo 360 served as a technical partner with the Lifecycle Marketing team to implement the below journeys.

Retention begins the moment a consumer expresses interest and becomes a subscriber.

It is at this early stage where The Washington Post establishes its relationship with subscribers and can make a positive first impression.

Investing in subscriber onboarding

Prior to launching The Post’s new subscriber onboarding journey in fall 2022, our practice was to send a single welcome email immediately after subscription purchase. While the email highlighted important subscriber benefits, it was one touchpoint that could easily be missed or overlooked in a subscriber’s inbox.

In 2022, we revamped the subscriber onboarding journey considering feedback from consumer research and analytical insights. We built a comprehensive onboarding journey with tailored email messages over the course of multiple months utilising customer 360 signals and a journey builder.

This was a game changer as it unlocked personalisation capabilities at the subscriber level. Consumer research also indicated an interest in comprehensive news and interaction with our products and newsroom beyond the dominant headlines from the past few years, such as the COVID-19 pandemic.

Subscriber onboarding is important in achieving subscription marketing goals for growing consumption, reducing churn and increasing customer lifetime value (CLV). It sets the tone on the overall relationship with new subscribers to achieve the several key objectives:

Positive subscriber experience: Onboarding is a way to introduce new subscribers to benefits and features, establish a relationship and build loyalty.
Reduce churn: Churn is the rate at which subscribers cancel their subscriptions. Effective onboarding can help reduce churn by providing new subscribers with the information and resources to build repeat habits and realise the value of their subscription.
Increase customer lifetime value: Subscribers who visit often tend to use multiple features and benefits, read newsletters and have increased satisfaction and a high net promoter score (NPS), thus leading to higher retention rates, which is a key driver of CLV.

In this journey, we are focusing on consumption since it is a key driver of both subscription revenue and ad revenue. It directly impacts the willingness of subscribers to spend time with our premium content, drive higher traffic, interact with ads and likely renew as they start to value content.

Our approach

We leveraged internal and external research combined with machine learning to build an automated new subscriber journey with defined success metrics. The journey has algorithm triggers to proactively engage subscribers to visit us and build everyday habits so we can increase frequency and recency of visits to drive revenue and lifetime value.

We know, based on internal research, that subscribers who have zero to two visits have a 15% lower retention rate compared to subscribers with more than 15 visits. Also, subscribers who use the app and visit more than 14 times have a higher NPS than others.

Guiding principles on the subscriber journey

We identified three key guiding principles on the subscriber journey:

Be subscriber-centric. It’s about them. In other words, we get to know their interests and their preferences about when they want to receive communications from us.
Be relevant. This builds strong relationships and loyalty.
Ensure consistent messaging across channels and devices.

Our current subscriber journey includes multiple email messages designed to educate, engage and inspire subscribers to act. Subscribers are given engaging content as they go through the journey in the first few weeks to drive recency and frequency of visits and, ultimately, build habits and loyalty. We have refined the journey to ensure that the right combination of actions work together at the right time.

The journey starts with recommended on-site onboarding actions and then smoothly transitions into triggered email communications. The first step in on-site onboarding is for subscribers to select their interests, time of day, newsletter recommendation and app download.

This is important personal data collection, as it helps us better understand customer needs and preferences to later make informed decisions to provide personalised experiences.

The first email is a personal welcome note from our authors to build connections with subscribers by sharing the mission of The Post. It is powerful and a chance to share with subscribers what we believe in and an affirmation for why they are with us.

As the email series continues, each subscriber receives tailored messages based on their individual actions. Our goal is to ensure that subscribers only receive information that is most relevant to their personal interaction with us thus far.

For example, in Week 1, the automated journey checks to see which onboarding actions different subscribers have completed. Subscribers who have not initiated or completed the recommended actions get targeted emails.

In the same week, premium subscribers who haven’t shared their associate subscription receive an email encouraging them to do so.

In both scenarios, behavioral data is used to tailor the onboarding experience for personalisation at the subscriber level.

Over the next few weeks, emails are deployed highlighting content ranging from exclusive investigations and opinions to educating subscribers on Washington Post (WP) Live events, gifting articles and exploring The Post via other platforms like a podcast or TikTok.

The subscriber journey

This diagram is an example of what the subscriber journey might look like:

Many of these benefits and features were selected based on analytical insights. There is a strong correlation between these actions and retention. Analysis showed that subscribers who gift articles or attend WP Live events are less likely to churn. Our goal is to educate subscribers about these benefits in their early tenure to positively impact retention and customer lifetime value.

Measurement plan

We have identified our primary and driver metrics to gauge increase in engagement and retention to measure its effectiveness by implementing a control group.

Primary metrics are:

Article pageviews
Total visits and days visited in 30 days
Habit adoption (newsletter engagement, app downloads, interests and author follows)
Retention over 90 days

Results

Since launch, our new subscriber onboarding journey has resulted in a 2% lift in retention and an increase in three-year CLV after 12 weeks.

Next iteration

To be relevant, journeys must constantly be reevaluated and enhanced. Our immediate next step is expanding to on-site and in-app messages, since not all subscribers can be reached via email. To be effective, subscribers must be communicated to in their preferred channel.

Further experimentation is planned to understand what messages resonate on which channels, such as podcasts or audio listening.

Read earlier WaPo 360 articles in the WashPost Engineering Blog:

July 19, 2022: Developing a New WashPost Content Taxonomy System

March 8, 2022: Doubling down on personalizing the reader experience

Personalized subscriber onboarding results in higher engagement, retention for Washington Post was originally published in Washington Post Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Here’s how The Washington Post verified its journalists on Mastodon

Chris Zubak-Skees — Mon, 06 Mar 2023 14:19:36 GMT

Drew Harwell’s new Mastodon verified status

A small cross-disciplinary team of engineers worked together to add a feature so journalists at The Washington Post could link their Mastodon profiles from The Post’s website and verify themselves on the social network.

In mid-November, we started thinking about how to best support our journalists as they explored Twitter alternatives. Engineering director Jeremy Bowers put out a public call for ideas on his Mastodon profile, and received a range of responses, including hosting our own Mastodon instance or verifying our journalists. After Twitter suspended several Post journalists in December, we activated a small team of engineers and identified specific technical projects we could build relatively quickly.

Hosting our own instance would be a major long-term commitment, which requires time to evaluate and would put our eggs in one basket. We wanted to find something we could do today and support journalists now.

Verification, on the other hand, supports our journalists who are engaging with audiences in various corners of the Fediverse, as the distributed network behind Mastodon is known, regardless of which instance they choose. It lets Mastodon users know if they’re interacting with someone who writes for The Post, which encourages trust in the source of the information they’re reading.

Verification on Mastodon isn’t dependent on a central authority like Twitter. No one person can sell verification or revoke it. Like the rest of the Fediverse, it’s built on open standards. The distributed nature of Mastodon poses unique challenges. But the distributed nature of the network also lets us build on it.

A Mastodon link on Merrill’s author page.

In some ways, verification is simple: We added a special link on author pages that Mastodon could check to verify that a particular Mastodon user is who they say they are. When a reporter adds the author page to their Mastodon profile, Mastodon fetches the page and looks for a link back to the reporter’s account. If it’s found, Mastodon adds a verification checkmark. This tells Mastodon users the account they’re looking at is actually the Washington Post author they claim to be.

Thankfully, this isn’t specific to Mastodon alone — this method is a standard supported by other networks, which means The Post can support verification elsewhere in the future.

Post engineer Holden Foreman, who is now The Post’s first accessibility engineer and had been contributing to the Mastodon project in his spare time, authored a code change to show the links on the pages. Rob Cannon and Tyler Fisher drafted a way for authors to add their own profile links in the website’s backend.

We were able to verify our first test accounts. So far, so good, but even a seemingly simple system can pose unexpected challenges when distributed across the internet.

The first hurdle came when Mastodon failed to verify some of our first adopters, including technology reporters Jeremy B. Merrill and Drew Harwell. We examined the verification code but couldn’t find an obvious explanation.

To debug this, Cannon searched the logs from our content delivery network, Akamai. It turns out that each Mastodon instance on which Merrill has followers requests the author page separately. In his case, that added up to more than 60 instances in a small sample, each independently requesting the page! Aspects of these requests tripped some of the anti-bot filtering that Akamai uses to protect our web site, which occasionally blocked verification.

But after addressing that, verification still failed. We were stumped until Dylan Freedman asked his followers on Mastodon for help. A user responded that Mastodon had a one megabyte limit on pages it requested while verifying. Most of our author pages were larger than a megabyte!

I authored a change to the Mastodon open source project’s verification code which looks at the first megabyte, instead of limiting the page to one megabyte, while Christian Stroh shrank our author pages down by cutting excess page weight. Cannon built a public tool to check that all the requirements for Mastodon verification have been met, which made this easier to debug.

At last, most journalists could successfully verify!

We hope you’ll follow our newly-verified reporters on Mastodon, especially those who have been at times suspended or hidden from search on Twitter, including Drew Harwell and Jeremy B. Merrill.

Wherever Post journalists go next, engineers will be working behind the scenes to support them.

This was a team effort by Rob Cannon, Holden Foreman, Tyler Fisher, Jeremy Bowers, Dylan Freedman, Christian Stroh, Jeremy B. Merrill and others. Special thanks to the Mastodon community for their contributions.

Here’s how The Washington Post verified its journalists on Mastodon was originally published in Washington Post Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leveraging AMP email to deliver real-time midterm election results

Benjamin — Wed, 15 Feb 2023 21:19:21 GMT

What is AMP email and how can it be used?

AMP (Accelerated Mobile Pages) for email is a framework that enables interactive app-like behavior and dynamic content within the inbox. Through the use of specific amp-components, users can fill out and submit forms, view content in real-time and engage in interactive modules such as games and quizzes all within an email.

Although it was released back in 2019, AMP for email has not become widely adopted for a number of reasons. At the time of this writing, AMP support remains limited to four email clients: Gmail, Yahoo! Mail, AOL Mail, and Mail.ru. Depending on the makeup of the email clients within a subscriber list, this can potentially leave out many users from the AMP email experience.

Also, there are many tech requirements that need to be addressed before AMP emails can be sent. Companies need to use an email service provider (ESP) that fully supports AMP on the backend, email addresses for senders need to undergo a time-consuming registration process to become whitelisted and approved to send AMP emails, and developers need to complete double the amount of work by coding an AMP email template as well as a separate HTML template for email clients that do not support AMP. These are some of the reasons why many companies have decided against using this technology in their email programs.

However, if companies are able to plan for and move past these obstacles, AMP has the potential to create valuable user experiences and improve upon certain limitations of standard emails. For example, standard email content cannot be updated or modified after it is sent. During rapidly changing situations such as fluctuating stock markets, live sporting events and fast-moving election results, emails sent in these environments can become outdated and less relevant by the time they’re opened by the users.

Can AMP email be leveraged to create more useful and informative experiences during a rapidly changing event such as election results coverage? Yes, this blog post will discuss how teams at The Washington Post leveraged AMP to deliver live, up-to-date midterm election race highlights whenever the email is opened.

Preparing and Building the AMP Template

In preparation for the 2022 midterm elections, the project team decided to build a live updating feed (LUF) that displayed the latest election headlines and placed it into the politics news alert email campaign. By using a preexisting email campaign, users with email clients that do not support AMP would continue to receive the same alert email in future campaigns. Their email experience would remain consistent and unchanged.

A sample of the breaking politics news alert HTML email campaign.

For the AMP email version, a new and separate email template was built following the AMP specifications and requirements. The AMP template design copied the overall HTML alerts email design to maintain a similar experience. Then progressive enhancements were built into the AMP template using the following amp-components:

1. The date in the alerts bar was replaced with a timestamp using the amp-timeago component. This switch was made to provide users with a clearer sense of how much time passed since the alert message was delivered.

{{ LUF_TIMESTAMP }}

2. The LUF module consisted of a title, three news headlines with individual timestamps, and a link at the bottom that directed users to the main Washington Post elections coverage. Users were able to click on the headlines and read the full article in a separate webpage. This module was built using the amp-list, amp-mustache, and amp-anim components. The amp-list component connected the module to a secure JSON endpoint bringing in headline text, timestamp data, and URLs. The amp-mustache component rendered this JSON data in the email. Whenever the email is opened, the JSON data would load with the latest election headlines.

Loading ...
Failed to load data.

3. Lastly, the team used the amp-anim component to display a blinking red dot at the top of the LUF module to indicate a real-time, live experience and to draw the user’s attention.

AMP Email Demos

On Nov. 3, the AMP email campaign for breaking news alerts was launched and users were able to view and engage with the LUF module. The AMP email below was sent on Nov. 10 at 9:28 p.m. ET and is shown opened on the Gmail website:

Below, the AMP email opened on the Gmail app in iOS:

If this AMP email is opened at a later time, the most up-to-date headlines with corresponding timestamps will display in the LUF module. Users are able to click on these new headlines to read the full article on a separate webpage. This AMP email creates a dynamic news experience for users at the time of opening while also adding value to past breaking news alert emails that would have otherwise contained stale content. After 30 days, the AMP email expires and the HTML fallback version is displayed in its place.

What’s Next

AMP for Email requires rigorous planning and preparation before a single email campaign can be launched. However, once those steps are fulfilled, there are vast opportunities to experiment and innovate on new email experiences that could benefit users.

Leveraging AMP email to deliver real-time midterm election results was originally published in Washington Post Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

My goals as The Washington Post’s first-ever Accessibility Engineer

Holden Foreman — Wed, 25 Jan 2023 15:52:34 GMT

Image by Dave Braunschweig (Creative Commons Attribution-Share Alike 4.0 International license)

Note: To make this blog post understandable to more readers, we also published a shorter version in plain language. If you have questions regarding either post, please contact me via email, Twitter, LinkedIn or Ma stodon.

The Washington Post is furthering its commitment to accessibility, and the work employees are already doing to make our content more accessible, by naming its first-ever dedicated Accessibility Engineer. I’ve long cared about accessibility, and I’m excited to serve in this new role.

So what do we mean by accessibility?

According to Indiana University, “Accessibility is the degree to which a product, device, service, environment, or facility is usable by as many people as possible, including by persons with disabilities.” A common example of accessibility in practice is including wheelchair-accessible seats at event venues, which may be reachable via ramps and/or elevators instead of only stairs.

At The Post, we care about the accessibility of our digital products. We have a statement on our website outlining some of our goals. We aim to include alternative text (alt text) on images, captions on videos and transcriptions of audio content such as podcasts. We also know that accessibility considerations stretch far beyond these fundamental measures. The underlying code of our products should be written with accessibility in mind. There are extensive guidelines to consider, such as the Web Content Accessibility Guidelines (WCAG) international standard. I’ve blogged in the past about some of our steps toward screen reader and keyboard accessibility.

We know at The Post that everyone must care about accessibility in order for it to happen. Having an engineer dedicated to accessibility will help us align our efforts, maintain up-to-date standards and explore new opportunities in research and feature development. The Accessibility Engineer will also help educate others on the latest accessibility practices and will be a resource for internal support.

I like Indiana University’s definition of accessibility because it emphasizes two things:

Accessibility is not something you either “have” or “don’t have.” There are levels to it. Your website may have appropriate alt text for some images but not all of them. You may have considered certain forms of color blindness, but not others, when choosing your color palette.
Accessibility is important for everyone. “The curb-cut effect” describes how cutting ramps into sidewalk curbs made the sidewalks more accessible not just to people in wheelchairs but to many others as well: people pushing strollers or carts, for instance.

Parts of WCAG relate to assistive technology, which is used primarily by people with disabilities to help them navigate digital content. An example of this is a screen reader, which may be used by people who are blind or low-vision to hear digital content described to them. They use controls to move between elements on a page, and the associated text and/or function is read aloud. This is one purpose of alt text; it is read by a screen reader to describe an image for those who cannot see it clearly. Similarly, interactive elements should often be labeled in code using aria-labels.

While standards are helpful, accessibility is more than meeting a checklist and calling it a day.

We want to create a dialogue with our users to better understand how you interact with our products and how we can make your experiences not only accessible but also user-friendly and efficient. Some changes may be as simple as adjusting labeling and page structures, while others may involve developing brand new features and experiences. And others may require updating the internal products that are used by The Post’s employees.

An essential part of my new role is doing my research and listening to you, The Post’s users. So please send any questions, feedback and thoughts that you have to accessibility@washpost.com. We can’t promise a response to everyone, but we will monitor our inbox carefully as we continue along this never-ending journey.

If you have questions for me personally or notice any errors in this blog post, please don’t hesitate to contact me via email, Twitter, LinkedIn or Ma stodon. Thanks so much for reading.

My goals as The Washington Post’s first-ever Accessibility Engineer was originally published in Washington Post Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

My goals as The Washington Post’s new Accessibility Engineer (in plain language)

Holden Foreman — Wed, 25 Jan 2023 14:51:53 GMT

Image by Dave Braunschweig (Creative Commons Attribution-Share Alike 4.0 International license)

Note: This is the plain language version of my Accessibility Engineer blog post. If you have questions, please contact me via email, Twitter, LinkedIn or Ma stodon.

I am The Washington Post’s new Accessibility Engineer. I am excited to serve in this new role.

To me, accessibility is about making something useful to as many people as possible. This includes people with disabilities. Some examples of accessibility are:

ramps and elevators for people in wheelchairs
braille for people who are blind or have low vision

I am a software engineer. I focus on the accessibility of things online. Some of those things are:

alternative text, also called alt text or image descriptions
captions for videos

Accessibility includes a lot of other things. There are standards that software engineers should consider, like the Web Content Accessibility Guidelines international standard. It is often called WCAG.

The Washington Post cares about accessibility, and it is helpful to have an Accessibility Engineer. This job will give me time to focus on accessibility. I will help teach my coworkers about accessibility, too.

Accessibility is not only about standards and checklists.

We want to understand how you use The Washington Post’s products. Some of our products are:

our website
our mobile apps
our newspapers

Do you have accessibility questions, ideas or feedback for The Washington Post? Please send them to accessibility@washpost.com. We might not respond to everyone. But part of my new job is listening to you.

If you have questions or feedback just for me, then please send them to my email, Twitter, LinkedIn or Ma stodon. Thank you so much for reading.

My goals as The Washington Post’s new Accessibility Engineer (in plain language) was originally published in Washington Post Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

We’re open sourcing our live election night model!

Emily Liu — Wed, 14 Sep 2022 17:32:30 GMT

We’re open sourcing our live election night model

by Emily Liu and Lenny Bronner

We have model news again! We’ve spent the last six months re-implementing our election night model in Python and are excited to finally open source this version and share it with the community.

GitHub - washingtonpost/elex-live-model: a model to generate estimates of the number of outstanding votes on an election night based on the current results of the race

Our original election night model was written in R. While this was good as a first approach, it led to a few constraints around deployment, maintenance and flexibility. We decided it was time to merge our model with our existing infrastructure.

But what is the model?

At the Washington Post, we use our model to generate estimates of the number of outstanding votes on an election night. We also generate more granular estimates — for general elections, we estimate the partisan split of outstanding votes; for primaries, we split these estimates by candidate.

Our election model provides useful context on where the final vote may end up at the conclusion of the race. Raw vote totals are often misleading when presented independently, due to factors such as a rise in absentee and mail-in voting and that geographic areas report results at different rates (e.g., rural areas often report results earlier than urban areas). You may have seen the output of our model on our results pages as these fuzzy bars.

Example of our model visualization from November 2020

We estimate the turnout and vote total numbers by comparing live tallies to previous election results. We then extrapolate the difference between the live and historical results for geographic subunits (counties or precincts) that have yet to report results. The model is an implementation of quantile regression with demographics as features, and we use conformal prediction to quantify the associated uncertainty.

For the sake of this blog post, we’ll focus on the architecture decisions we made from an engineering perspective. For more information on how we calculate the estimates, you can read more here and read a graphic explainer of how election models work here. Further reading is linked at the end of this post.

From R to Python

In 2020, we built and open-sourced an election model that was written in R. The model was hosted on a server via Amazon EC2 that our election pipeline could access via an API. Unfortunately, R is not a very popular language on our team (sorry!) and we were severely constrained by the response time from pinging the model API. Additionally, it was not as fast, scalable, or interoperable with AWS as we wanted, and it was hard to share code written in R with the rest of our codebase. Similarly, it was also difficult to parse errors and debug whether errors came from the API or the model implementation itself, especially under a time crunch during an election. We wanted a better way to deploy the model.

Simplified diagram of our former pipeline

As a result, after the November 2021 general election, we began the process of rewriting this model from R to Python.

A lot of the model re-write simply meant translating code written in the R tidyverseto the equivalent in pandasand numpy. But one larger issue was the reliance of our R implementation on the quantregpackage for quantile regression. Unfortunately we couldn’t find a numerically stable Python implementation of quantile regression. (Numerical instability can lead to inconsistent results.)

As a result, we decided to write one ourselves, and we’ve open-sourced that too. You can find our implementation (as elex-solver) below. It uses the cvxpy optimization library. We hope you might find it useful as a stand-alone package if you need to do quantile regression in Python.

GitHub - washingtonpost/elex-solver

From a Hosted API to a Serverless Lambda

Now, the model is deployed as a Python package that we invoke via a serverless Lambda function.

Simplified diagram of our updated election pipeline with the model implemented as a Lambda

We deployed this model as a Python package because this makes it easy for collaborators both inside and outside of the company to install and use it. Our formatters from various election results vendors are Python packages as well, so this follows our existing architectural norm and simplifies our infrastructure.

When the model was formerly deployed as an API hosted on an EC2 instance, we pinged the API at a set rate of once every minute. However, we often ran into timeout issues. Invoking each model run as a serverless Lambda function solves this, because the Lambda maximum timeout is 15 minutes, which is well over our needs. Additionally, we have future plans to further align the model with our existing architecture by invoking the model only when new results come in instead of at a fixed rate. The architecture diagram above reflects this.

Importantly, we are able to run Lambdas independently of the rest of our pipeline (staging and production state machines). As a result, we preserve the flexibility of testing model runs with varying input parameters on election night, so we can tweak these parameters as we see fit as more information trickles in throughout the night. We provide ample logging per invocation of the model and use AWS CloudWatch to monitor the model throughout election nights.

This model is a continuous work in progress. You can browse the code yourself here. Installation instructions are included in the README.

GitHub - washingtonpost/elex-live-model: a model to generate estimates of the number of outstanding votes on an election night based on the current results of the race

Additional Reading

Over the course of the last two years, we have published multiple resources to share our progress. We will continue to share our work publicly, and we welcome feedback!

October 2020: “How The Washington Post Estimates Outstanding Votes for the 2020 Presidential Election” by John Cherian and Lenny Bronner
November 1, 2020: “What the Post’s election results will look like” by Jeremy Bowers, Lenny Bronner, and Terri Rupar
November 2020: Github repository for the original election night model used in the 2020 elections
December 2020: “An Update To The Washington Post Election Night Model” by John Cherian and Lenny Bronner
2020 General election night model open sourced repository
February 21, 2021: “How The Washington Post Estimates Outstanding Votes for the 2020 Presidential Election” by Lenny Bronner
November 2, 2021: “How The Washington Post will model possible outcomes in the Virginia governor’s race” by Lenny Bronner
May 17, 2022: “How the Washington Post’s election night model works” by Lenny Bronner
May 24, 2022: “How election modeling can help us understand who might win” by Adrian Blanco and Artur Galocha

If you’re interested in this work and want to learn more, we recommend reading this starter guide on what our team had to learn about election data or reaching out to us.

Acknowledgements

This work would not be possible without the many, many people who have collaborated with us and supported us along the way. Thank you to the elections engineering team for their contributions to this project: Jeremy Bowers, Dana Cassidy, John Cherian, Holden Foreman, Dylan Freedman, Chloe Langston, Brittany Mayes, Anthony Pesce, Erik Reyna and Chris Zubak-Skees.

Thank you to Dylan Freedman for editing this post.

We’re open sourcing our live election night model! was originally published in Washington Post Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.