Understanding How Data Scientists Understand Machine Learning Models

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Contact

Kristen Perez

Communications Officer

Sidebar Content
No sidebar content submitted.
Summaries

Summary Sentence:

CSE Ph.D. Student Fred Hohman releases his latest software, Gamut, that aims to help data scientists understand machine learning outputs.

Full Summary:

No summary paragraph submitted.

Media
  • Gamut - Visualization Software Gamut - Visualization Software
    (image/png)

How do data scientists read and understand machine learning model outputs? This is the question that a new design probe built by a team of researchers led by School of Computational Science and Engineering (CSE) Ph.D. student Fred Hohman aims to answer.

“Without good models and the right tools to interpret them, data scientists risk making decisions based on hidden biases, spurious correlations, and false generalizations. This has led to a rallying cry for model interpretability,” said Hohman.

To address this issue, Hohman teamed up with U.C. Berkeley Ph.D. candidate Andrew Head and Microsoft researchers Rich CaruanaRobert DeLine, and Steven Drucker, to create Gamut. Gamut is an interactive system designed to investigate how data scientists interpret models, and how interactive interfaces can support data scientists in answering questions about model interpretability. 

“Machine learning is doing all this amazing work nowadays like cancer prediction, predicting fire risks in buildings, and poverty prediction via satellite images. But there are many applications where demographic bias such as gender, age, or race, is learned from data,” continued Hohman.

“That brings us to Gamut, which focuses on an area of machine learning called interpretability, which is essentially trying to understand what a machine learning algorithm has actually learned so data scientists can trust its predictions.”

[VIDEO::https://youtu.be/R-amW_yNX6I::aVideoStyle]

The system uses generalized additive models (GAMs), models that combine high accuracy with an inherently intelligible structure, and interactive data visualization, to display model results and predictions to ultimately study how data scientists use explainable interfaces for interpretability.

Surprisingly, while the term interpretability loosely describes a human understanding of some component of a model, no formal agreed upon definition has been reached about what component should be understood, according to Hohman. This is another reason why Gamut is a critical piece to solving the interpretability puzzle.

Rather than aiming to define interpretability, Hohman says Gamut instead aims to operationalize it, or turn the fuzzy concept of interpretability into something more easily usable and actionable.

“Since machine learning models are still being used despite their problems, the idea is that we can break interpretability down into a suite of techniques to help data scientists interpret models today. And, by collaborating with Microsoft, our human-centered approach using rich user interaction and data visualization can be informed and tested by professional data scientists who work with machine learning daily.

“Our investigation showed that interpretability is not a monolithic concept. Data scientists have different reasons to interpret models and tailor explanations for specific audiences, often balancing competing concerns of simplicity and completeness,” Hohman said.

 

Additional Information

Groups

College of Computing, OMS, School of Computational Science and Engineering

Categories
Student Research
Related Core Research Areas
Data Engineering and Science
Newsroom Topics
No newsroom topics were selected.
Keywords
machine learning, visualization, software
Status
  • Created By: Kristen Perez
  • Workflow Status: Published
  • Created On: Apr 23, 2019 - 1:48pm
  • Last Updated: Apr 24, 2019 - 11:13am