OMSCS Student Uses Machine Learning to Help Understand COVID-19

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Contact

Tess Malone, Communications Officer

tess.malone@cc.gatech.edu

Sidebar Content
No sidebar content submitted.
Summaries

Summary Sentence:

A student at Georgia Tech is using artificial intelligence (AI) techniques like natural language processing and machine learning (ML) to narrow down the most relevant information in this growing COVID-19 data set.

Full Summary:

No summary paragraph submitted.

Media
  • Ken Miller Ken Miller
    (image/jpeg)

With dozens of research papers about COVID-19 being published each week, it can be difficult for doctors and scientists to read the most important studies.

A student at Georgia Tech, however, is using artificial intelligence (AI) techniques like natural language processing and machine learning (ML) to narrow down the most relevant information in this growing data set.

Kenneth Miller, a student in Georgia Tech’s Online Master of Science in Computer Science (OMSCS), is using these tools to develop algorithms to ensure that the most important COVID-19 research reaches doctors. His work is part of an ongoing challenge to use ML to empower the medical community to find the best COVID-19 studies.

Information Overload

The challenge started when Kaggle, a Google data science and ML community, partnered with the White House and several leading research groups to create the COVID-19 Open Research Dataset (CORD-19). With more than 47,000 scholarly articles about COVID-19 and other coronaviruses, it’s one of the most comprehensive research databases for the pandemic.

To sift through the data, Kaggle released CORD-19 to its community and asked them to use it to answer some of the  toughest research questions about COVID-19. As incentive, for every task completed successfully, participants like Miller receive $1,000 in prize money.

As an OMSCS student specializing in ML, Miller has joined a few previous Kaggle challenges, but for much less significant tasks like home values or NCAA brackets. For Miller, working on this dataset presented an especially relevant problem.

“I am fascinated with everything AI, so when I heard about this, I figured if any of my skills could help anyone, I should try,” said Miller, who is a lawyer outside of his studies.

Keep it Simple

Miller said his OMSCS studies prepared him for the challenge. The AI track focuses on the practical implementation of AI methods. This made it easier for Miller to start with an overwhelming amount of data and get to an endpoint that solves the problem. His experience using the programming language Python for class also enabled him to agilely work with the data.

Armed with this knowledge, Miller applied a strategy he uses on every project.

“Whenever I start a new project, I try and see if I can craft a simple yet effective solution from scratch,” he said.

He has worked on specific Kaggle challenges he can apply this strategy. The first ML model Miller developed finds the most relevant sentences in a study. To accomplish this, he used a simple scoring algorithm that determines how many times keywords appear in a sentence. Then the model measures the ratio of keyword occurrences to sentence length.

For a separate challenge, Miller created a search engine for common COVID-19 research questions, such as: What is the average time the disease takes to incubate? How long is it contagious? How long until symptoms appear?

Up to the Challenge

These are just a few of Miller’s models, and he continues to work on new challenges Kaggle offers. Tasks now include deep dives into epidemiology, understanding how many patients a study was based on, and what scientific method was employed.

“The trick, as in any project like this, is understanding and assimilating the data to start with,” Miller said. “But using Python makes the initial data wrangling pretty easy. The hardest part is building new ways to squeeze more desired info out of the documents.”

Miller’s efforts have been noticed. His work has been cited several times on the contributions page.

For more coverage of Georgia Tech’s response to the coronavirus pandemic, please visit our Responding to COVID-19 page.

 

Additional Information

Groups

College of Computing

Categories
No categories were selected.
Related Core Research Areas
No core research areas were selected.
Newsroom Topics
No newsroom topics were selected.
Keywords
No keywords were submitted.
Status
  • Created By: Tess Malone
  • Workflow Status: Published
  • Created On: May 5, 2020 - 12:28pm
  • Last Updated: Jun 4, 2020 - 9:11am