Data Mining COVID-19 Epidemics: Part 1 Leave a comment

These days we are all following the statistics of COVID-19, looking at how our own country is faring and how it’s comparing with other countries. Luckily, only a few have a statistically meaningful number of deaths (which solemnly reminds us of the difference between statistical and practical significance!), so we concentrate on the number of confirmed cases.

You’re reading the first and most basic blog post from a series in which we will investigate this data using Orange. Most people are capable of doing something in Excel(-like programs), and some can do everything in Python with pandas and jupyter. I’ll show you how many people can do many things in Orange.

Today, we will see how to get this data into Orange, draw some basic curves, and relate it to other data sources. Don’t expect anything dramatic; this will be more about showing some creative ways of connecting a few widgets, and starting to explore the data about COVID-19 epidemics.

Getting the data

John Hopkins University collated some COVID-19 information in a machine-readable format and published it on Github. We will examine the table with confirmed cases by regions and countries.

To get the numbers behind the above table, click “Raw”. Or this link: Save the page and you’ll have a file to play with.

Easier still, Orange’s File widget can load the data directly from the web, and handles a basic .csv file (for more fine-grained options, use CSV File Import).

So, for starters, add the File widget to the canvas and copy the above link to the URL field. You can then connect it to a Data Table widget and check that everything’s loaded OK. You should see a table with rows corresponding to regions and countries, and columns corresponding to dates, with two additional columns detailing region locations (latitude, longitude).

Note that this is live data, so your results will differ from those we show here.

Plot the usual plots

Connect the File to a Line Plot widget to see the graph. By default, Line Plot shows means and ranges, while we are interested in raw lines. Let’s click the checkboxes accordingly.

More …..

Leave a Reply

%d bloggers like this: