The toolkit aims to help policy-makers, developers and data scientists identify and respond to key trends in the pandemic such as finding correlations between poverty levels and infection rates.
IBM’s Centre for Open-Source Data and AI Technologies (CODAIT) has launched a toolkit that aims to help policy-makers, developers and data scientists identify- and respond to key trends in the coronavirus pandemic.
The rationale for “Covid notebooks” is by freeing developers and data scientists from more mundane tasks, they can concentrate on advanced analysis and modelling tasks instead of worrying about aspects like data formats and data cleaning.”
As a jumping-off point for more in-depth, interesting analysis, for instance, it could be used to find correlations between poverty levels and infection rates.
Writing in a blog post about the release of Covid notebooks, Frederick Reiss, chief architect at CODAIT, said: “For data scientists and policy-makers who are analysing the effects of Covid-19 and trying to come up with actionable plans based on data, the information landscape is overwhelming.”
He continued: “A near-constant flow of data from research studies, news outlets, social media, and health organisations make the task of analysing data into useful action nearly impossible. Developers and data scientists need answers to their questions about data sources, tools, and how to draw meaningful and statistically valid conclusions from the ever-changing data.”
Policy-makers face similar challenges, IBM notes. The US has more than 3,000 counties, each with a unique story of how Covid-19 is impacting its community. According to IBM, policy-makers are asking a number of questions such as: What stories can we tell in the aggregate? Are there patterns we see across the country? What regions or demographics are getting affected the most by the pandemic?
The toolkit uses developer friendly Jupyter notebooks to cover each of the initial data analysis steps and create data processing pipelines using Elyra Notebook Pipelines Visual Editor and KubeFlow Pipelines.
“Developers and data scientists need answers to their questions about data sources, tools, and how to draw meaningful and statistically valid conclusions from the ever-changing data”
The tools in the repository use authoritative sources to arrive at aggregate insights policy-makers can be used to make real-time, critical decisions. For county-level data about the US, a data extraction notebook downloads the latest data from the Covid-19 Data Repository by the Centre for Systems Science and Engineering (CSSE) at Johns Hopkins University.
This dataset is the primary source for many of the predictive models used by organisations working with the Centres for Disease Control (CDC).
The notebook fills in known gaps in this primary source with additional data from the New York Times Coronavirus (Covid-19 Data in the United States repository (for more-complete data on Rhode Island and Utah) and New York newspaper The City’s digest of the daily reports from the New York City Department of Health and Mental Hygiene (for borough-level data on New York City). They also use the European Centre for Disease Prevention and Control’s data on the geographic distribution of Covid-19 cases worldwide as their data source for worldwide data at the granularity of individual countries.
IBM said it believes in the importance of democratising technology, activating developers with the most up-to-date datasets and tools, which can help policy-makers make the most informed decisions for citizens’ wellbeing.
Developers and data scientists can also contribute directly to the tools that they use to perform an analysis by making pull requests to IBM’s GitHub repository.
You might also like: