Welcome The Findings The Data A personal note Legal Stuff Other Pages here Return To Top

Welcome

After the 2020 election, I (like everyone) heard allegations of fraud and “cheating”. I wanted to see for myself how absurd the claims were. What I found back then, I am sharing publicly here for the first time. What I saw in this data (and the cascade of events around it) fundamentally changed how I see the world.

The Findings

I founded a VC-backed Machine Learning and Data Analytics startup focused in the Information Security market. So in December 2020 (after hearing all the controversy) I fired up some of my data analytics tools and wrote some simple custom code to take a closer look at voter data from a contested battleground state: Pennsylvania.

The TL;DR

This is some of what I found:

  • Ballots were returned (filled-out and sent back to the state) with DoB showing them as under 18 years old. Count: 2 6cfb0fa7a0084c18af85098d6fc4de5c.png

  • Ballots were returned (filled-out and sent back to the state) with DoB showing them asover 100 years old. Count: 1,558 87421893f77a4b179575e41aa2c63180.png

  • Ballots were returned (filled-out and sent back to the state) with DoB showing them as over 120. Count: 93 3126aadfe34b4cdab3404ec50f9d446b.png

  • Ballots were marked as received by the state with dates BEFORE they were mailed out by the state. Count: 23,305 97364d4516b645b2934ef2a0db4c8f9f.png

  • Ballots were marked as received by the state the SAME DAY they were mailed to the voter by the state. Count: 34,916 810970dd8979422fb1a307441e06254a.png

Ok, now that you’ve seen some of it, you’ve gotta be thinking what I thought:

“Surely there is a rational explanation for this…”

or

“These are just clerical errors…”

If you are thinking the way I did, you should know that it was these simple questions that began a long “walkabout”, searching for explanations from public officials and news outlets.

It’s been nearly FOUR years of continuous homework. I’m no longer the same person, but it’d be nice to know what happened..

I have some longer thoughts here on this.

If you dont see the big deal with the above data, this is where cognition plays a key role. You have to think a bit like a risk analyst or an auditor. If you arent accustomed to this, you might have to slow your mind down a bit, because it can be subtle. We have to pause and think critically about the data because there is no guide:

  • Why would someone build a process to collect this data only to also allow it to be inaccurate?
  • What are some possible circumstances where inaccurate data of this kind can arise?
  • Processes must be built and staff must be trained to interact with this data.
  • Software and Databases schemas have to be written to support this data. Why would you make the effort to store all this data and then not validate it?

As you mentally play with the data, fields, and scenarios more things come to mind. Smarter people will think of better ways to interrogate the data than I did. The above are just a few of the initial things that alarmed me. This is why I share my code and the data below. Other queries have interesting results:

  • Duplicate entry detection (using various combinations of columns)
  • Birthday Paradox stats on Dates of Birth
  • etc

For brevity sake, I’ll stop with the analysis here. I have more thoughts on a dedicated page.

The Data

You’re probably curious about where I got it and how I processed it. As of the time of publishing and for the last 4 years (I would check periodically) it is available Archive.org’s archive of the Pennsylvania OpenData page (which was briefly moved or removed shortly after the election controversy).

Using the PAOpenData interactive visualization tool:

You can see all of Archive.org’s snapshots of the site here:

https://web.archive.org/web/20240000000000*/https://data.pa.gov/Government-Efficiency-Citizen-Engagement/2020-General-Election-Mail-Ballot-Requests-Departm/mcba-yywm

Here is a specific snapshot:

https://web.archive.org/web/20201118054516/https://data.pa.gov/Government-Efficiency-Citizen-Engagement/2020-General-Election-Mail-Ballot-Requests-Departm/mcba-yywm

Click “View Data” near the top.

NOTE: keep in mind this is Archive.org’s archive of the site, so it may take a long time to load. After loaded it will look like this: d388a2cfee1b4073a5226f49fa415197.png

The original PAOpenData link is here: https://data.pa.gov/Government-Efficiency-Citizen-Engagement/2020-General-Election-Mail-Ballot-Requests-Departm/mcba-yywm

Downloading the data yourself:

Here is a direct link: https://web.archive.org/web/20201118054516mp_/https://data.pa.gov/api/views/mcba-yywm/rows.csv?accessType=DOWNLOAD

https://web.archive.org/web/20201113141731mp_/https://data.pa.gov/Government-Efficiency-Citizen-Engagement/2020-General-Election-Mail-Ballot-Requests-Departm/mcba-yywm/data

My code is available here:

You can even download my code and data here using Docker:

docker run -it -p 8888:8888 -p 6006:6006 sa7ori/pa2020 bash

For more technical information on the dataset and using my code see this dedicated page with more details.

A personal note

Before I started on all this, I had no idea how election administration worked. I had no idea that the data was even publicly available. I didnt know what to expect. I assumed election-data was treated with the same protections, tracking, and care as physical cash money. I assumed all related records of ballots/votes would be as meticulously guarded as “account balances” or ledgers are for money. I thought there was a “Brinks truck” for votes. (I’ve spent the last 4 years getting mugged by reality.)

It isnt the findings in the raw data that I found so alarming, but as time passed what bothered me most (and shifted my worldview) was that NONE of my trusted sources of information offered a “thinking person’s” explanation of what had happened. All I found (from my trusted sources) was generalizations, vapid proclamations, slogans, and repetitive mantra.

Furthermore, friends, family, and colleagues could care less.

I have more thoughts here on a dedicated page.


My Politics (if you’re curious):

Before 2020, I didn’t care about politics or think critically about world events. I thought it gauche to discuss politics or money in mixed company (or even on the internet).

I did not vote for any major party candidate in the 2020 General Election. Historically, I economically leaned Free Market/AnCap and I politically leaned libertarian (small “L”, with affinity for Mises Caucus, Ron Paul-types, and for a short time, 2016 Bernie-types).

Although I did not vote for a major party candidate in 2020, I plan to do so in 2024.


Brief Legal Information

All data archived herein (and in the accompanying technical resources) was obtained through Open Public Records, and is not considered PII (Personally Identifiable Information). All sources of data have been provided in this document to corroborate and substantiate the findings presented in this document. Additionally, all data was not only publicly provided by the Pennsylvania Department of State through the OpenDataPA webportal, but was obtained through explicit DOWNLOAD links (not screen-scraped) and as such was cleansed by PA OpenData for all PII.