For those who want to “see for themselves”.

Downloading the data manually

The simplest way to view the data is to download the raw data directly and view it in Excel or OpenOffice (or some similar tool).

You can download the full CSV data directly at this link or others like it in the archive.org snapshots: https://web.archive.org/web/20201115001813if_/https://data.pa.gov/api/views/mcba-yywm/rows.csv?accessType=DOWNLOAD

e566c7ae08a14af8b4f4042472b31277.png

What the data looks like

…in a text editor like Notepad. 000564228e6047ccba2ac2401d08a995.png

Be sure to read the descriptions of the data fields

For example, the Date of Birth January 1st 1800 occurs in the dataset. It may be alarming to you initially, but the field descriptions explain that this is a “protection” for certain voters (like the victims of domestic violence). In all, I only counted ~60 of these. 3ef5f2e498ab442c97065d94941e368c.png 8f685fa406eb4510b9e52cbcf104342e.png

Using the PAOpendata viz in your browser

The second easiest way to view the data is using the PA OpenData visualization tool in your web browser. It offers some user-friendly sorting, querying, and graphing functions. The original PAOpenData link is here: https://data.pa.gov/Government-Efficiency-Citizen-Engagement/2020-General-Election-Mail-Ballot-Requests-Departm/mcba-yywm

You can see all of Archive.org’s snapshots of the PAOpenData site here:

https://web.archive.org/web/20240000000000*/https://data.pa.gov/Government-Efficiency-Citizen-Engagement/2020-General-Election-Mail-Ballot-Requests-Departm/mcba-yywm

Here is a specific snapshot:

https://web.archive.org/web/20201118054516/https://data.pa.gov/Government-Efficiency-Citizen-Engagement/2020-General-Election-Mail-Ballot-Requests-Departm/mcba-yywm

Click “View Data” near the top. 53fb22439b7743bd8f5a22c8e1b0ac3f.png

NOTE: keep in mind this is Archive.org’s archive of the site, so it may take a long time to load in your browser.

After loaded it will look like this:

d388a2cfee1b4073a5226f49fa415197.png

Running the Docker container:

The Docker container contains the code I wrote to explore the dataset. It is self-contained and has all the dependencies already installed. This will allow you to immediately get to work tinkering with the data and writing your own code if you are so inclined. It is the same code that was used to generate the screenshots in the main writeup.

It also contains a sampling of the data itself, but you can download the data yourself also and use that.

Tools Used:

  • Jupyter Notebooks
  • Python
  • Pandas
  • PyTorch
  • Matplotlib (only for heatmaps and scatterplots not shared in this post)

How to run the Docker container

docker run -it -p 8888:8888 -p 6006:6006 sa7ori/pa2020 bash

It will automatically download the container: 6955e473c4a44f1499437b0109ad5bdf.png Once download is complete, the container will run and drop you into a rootshell: 67c920f85f24426cb89675488fdbe43a.png Run the run_jupyter.sh shell script Open your browser to: http://localhost:8888 and click on the PA 2020 Notebook efbe8fd0d3c9447fba72d31fcc127a15.png You are now in control: 9f9b4ea9a1ad4710b4412a09a17c95ca.png