Analyzing the “Anya Taylor-Joy Effect”

The release of The Queen’s Gambit on Netflix in October 2020 coincided with a massive global spike in chess interest. I would like to explore if there has been any relevant effect that is measurable and attributed to the Netflix show. Was this just a continuation of the “pandemic hobby” trend, or did Beth Harmon truly change the game?

In this post, I’ll describe causal inference toy analysis hosted in my queens_gambit_causal_impact repository, which attempts to isolate the “Netflix effect” from any other change that may have influenced.


0. Summary

This analysis quantifies the specific impact of the Netflix miniseries The Queen’s Gambit on the popularity of chess. I’ll use Bayesian structural time-series models, moving beyond simple line charts to estimate a counterfactual: What would chess interest have looked like in late 2020 if the show had never been released? Using Wiki page visits, the analysis provides a statistical smoking gun for the show’s influence.

1 Why Causal Impact?

Traditional metrics often fail when multiple variables shift at once. In late 2020, we had two massive drivers for chess: the ongoing COVID-19 lockdowns and the release of the show.

We use the CausalImpact methodology because it is autoregressive and simple to use when we have limited data. By picking variables that are correlated with our target (chess) but unaffected by the “treatment” (the show), such as general interest in other board games, we can predict the baseline trend. The difference between this predicted baseline and the actual observed data is the “causal impact”.

2 The Queen’s Gambit Phenomenon

When The Queen’s Gambit premiered on October 23, 2020, it became a cultural juggernaut. However, chess was already growing in early 2020 due to the pandemic. The challenge of this analysis is to separate these two waves. As noted in the project, the “treatment” is defined precisely as the show’s release date, allowing us to see if the slope of interest changed significantly enough to be considered a direct result of the series.

3 Methodology

3.1 Data Acquisition

The study pulled daily Wikipedia pageviews for games like Chess, Backgammon, and other hobbies/games. We’ll use these for our estimation. This initial exploration was crucial because it confirmed that a simple “before vs. after” comparison would be flawed; we had to account for the fact that chess was already trending upward due to global lockdowns before the show even premiered.

date Backgammon Chess Gardening Guitar Origami Painting Piano Rubik’s Cube Sudoku Yoga
2018-01-01 3239 4906 485 2309 1031 1480 1941 2361 1803 3986
2018-01-02 2480 5270 568 2405 1087 2154 2335 2303 2097 4981
2018-01-03 2228 5040 540 2504 1265 2057 2295 2322 1971 4867
2018-01-04 2181 5346 569 2575 1142 2149 2149 2337 1791 4932
2018-01-05 2153 5599 501 2343 1202 2159 2177 2346 1922 4520

3.2 Stationarity and Pre-processing

Before modeling, it is essential to examine the properties of the time series. In the notebook, I looked at the trends to ensure we weren’t being misled by seasonal noise or erratic outliers. While the Causal Impact package (based on BSTS) handles non-stationary data better than traditional OLS regressions, checking for stationarity and understanding the underlying growth components helps in selecting the correct “pre-intervention” period. This ensured the model was trained on a stable relationship between the target and the predictors.

3.3 Synthetic Control creation

To isolate the “Netflix effect”, I constructed a Synthetic Control. This involved selecting the suite mentioned in 3.1. I fitted a Linear Regression on the other “hobbies” by predicting Chess. The logic here, as commented in the code, is that these terms share the same “lockdown DNA” as chess but have no reason to spike because of a TV show about a Grandmaster. By combining these variables, the model creates a “Synthetic Chess” by taking the games that were mostly showing more correlation with chess.

date Backgammon Chess Gardening Guitar Origami Painting Piano Rubik’s Cube Sudoku Yoga Chess_synthetic
2018-01-01 3239 4906 485 2309 1031 1480 1941 2361 1803 3986 6087.935964
2018-01-02 2480 5270 568 2405 1087 2154 2335 2303 2097 4981 5544.680612
2018-01-03 2228 5040 540 2504 1265 2057 2295 2322 1971 4867 5273.755123
2018-01-04 2181 5346 569 2575 1142 2149 2149 2337 1791 4932 5048.938634
2018-01-05 2153 5599 501 2343 1202 2159 2177 2346 1922 4520 5057.021750

3.4 Causal Impact

With the data prepared, I applied the CausalImpact algorithm. This technique uses a Bayesian structural time-series model to predict the counterfactual. Instead of a simple linear projection, the model uses the behavior of the synthetic control (the other board games) to estimate what the chess trend should have looked like post-October 2020.

3.5 Placebo Tests

The model then predicted what chess interest should have been how chess and the other board games were performing before the release date. I performed a total of 10 placebo tests by picking random dates before the show and saw the results.

4 Findings

By looking at the exploratory time series, it looks that chess page visits spiked from averaging 4.5k in May-Oct 2020 to 9k in Nov-Apr 2020-21, after the movie release. From the picture below, you can immediately notice that it looks there’s been an impact, right?

Alt text

But is this 102% lift really causal? If yes, how much?

I applied the Causal Impact function and you can see the result below:

Alt text

With the function impact.summary(‘report’), you can read the explanation provided directly by the library.

During the post-intervention period, the response variable had an average value of approx. 8990.02. By contrast, in the absence of an intervention, we would have expected an average response of 4059.16. The 95% interval of this counterfactual prediction is [3154.04, 4968.13]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is 4930.86 with a 95% interval of [4021.89, 5835.98].

The above results are given in terms of absolute numbers. In relative terms, the response variable showed an increase of +121.48%. The 95% interval of this percentage is [99.08%, 143.77%].

This means that the positive effect observed during the intervention period is statistically significant and unlikely to be due to random fluctuations. It should be noted, however, that the question of whether this increase also bears substantive significance can only be answered by comparing the absolute effect (4930.86) to the original goal of the underlying intervention.

The probability of obtaining this effect by chance is very small (Bayesian one-sided tail-area probability p = 0.0). This means the causal effect can be considered statistically significant.

5 The importance of Placebo tests

To validate these findings, I used Placebo Tests (or “A/A testing” in a temporal sense). I ran the same model 10 times on a date before the show was released—where we know no treatment occurred—we can check if the model incorrectly finds an effect.

As highlighted in the project’s logic, if the model shows a “causal impact” on a random date in July 2020, our model is likely picking up noise. Because the placebo tests in this study showed no significant impact, we can have much higher confidence that the October spike was truly caused by the show. See picture below, the red line is the result given by our model and the blue bars are the placebo test results:

Alt text


Sources:

Kay H. Brodersen, Fabian Gallusser, Jim Koehler, Nicolas Remy, Steven L. Scott - Inferring causal impact using Bayesian structural time-series models - 2015

Matheus Facure - Causal Inference in Python - 2023