I don’t need to remind you that the world has changed dramatically since the worldwide spread of the COVID-19 novel coronavirus; millions are suddenly unemployed, or else working at home. If you suddenly find yourself in one of these situations, this can be a good time to pick up some new coding and data science skills while helping to investigate the cause of our current crisis.
You’ll see many statistics and analyses regarding COVID-19 getting thrown around online, but you don’t necessarily have to take their word for it. Many large sets of COVID-19 data are publicly available. If you’re one of the many who suddenly finds themselves with a lot more free time on their hands, now could be a great opportunity to learn some new things while playing around with real-world data sets. It’s not likely that you’ll be generating insights beyond those of professional epidemiologists, but it never hurts to have more eyes on the data. Below is a list of a few compiled sources of COVID-19 information.
COVID-19 Open Research Dataset
The White House and a coalition of research institutions have made the COVID-19 Open Research Dataset (CORD-19) available for download. This dataset consists of 59,000 journal articles (47,000 with full text) on COVID-19, SARS-CoV-2 and other coronaviruses. This represents a remarkably large collection of freely accessible scientific literature, so if you’re looking for a large dataset to try learning some text mining techniques, this is as good a place to start as any.
Johns Hopkins COVID-19 Data Repository
The folks at the Johns Hopkins University Center for Systems Science and Engineering are the ones behind the popular ArcGIS global COVID-19 tracking map. The curated set of data that they use to produce this map is available here. This page has a handy list of all of the project’s data sources, including the WHO, CDC, and various international government statistics, in case you wanted to do more sleuthing on your own. This collection previously only contained case and death counts by country, with a few select U.S. regions delineated (e.g. New York City), but has recently been upgraded to also report county-level data like the New York Times data set mentioned below. This git repository is automatically updated on a daily basis, so don’t forget to
git pull often.
New York Times U.S. COVID Data
The New York Times has its own set of interactive graphics on the COVID-19 outbreak, and like the JHU site mentioned above, the company has made the backend data for these graphics freely available. This data set tracks reported COVID-19 cases and deaths at the nation, state, and county levels, and is updated on a daily basis.
Finding Ways to Spend Your Time
As you can see, there are several different sources of COVID-19 data to play around with. Many other services make use of one or more of these sources – for instance Tableau’s COVID-19 Data Hub includes a free trial for a starter workbook, utilizing the Johns Hopkins-compiled data mentioned above. This allows users to rapidly prototype different virus-related visualizations, in case you want to try playing around with data, without getting into too much hardcore coding.
If you are interesting in learning more about coding, this could be a great time to jump into some free online training (maybe even if you wish to join the endangered, but very important ranks of Cobol programmers). Probably of more interest to most people are free courses in more modern languages, such as Google’s python machine learning crash course. I do want to be clear here – it is okay to feel isolated, afraid or bored in these times, and it’s okay to respond by this pressure by bingeing Tiger King on Netflix in a single day. However, if you decide you would like to try something new, there’s always a need for more people who can find patterns underlying the phenomena that shape our world and society. If it makes us better prepared for the next pandemic, then all the better.