Skip to main content

Interpretable and Explainable Models

Project Members

Tanya Berger-Wolf, David Edward Carlyn, Wei-Lun Chao, Arpita Chowdhury, Dipanjyoti Paul, Yu Su

Project Goals

To develop novel approaches to explain machine learning models’ predictions on image-based biology tasks. The predictions by current machine learning models are often considered black-boxed, preventing us from learning meaningful biological information from them. We attempt to 1) develop interpretable machine learning models whose inner workings could reveal biological meaningful information and 2) develop explainable approaches to explain a black-boxed model prediction, especially for how it distinguishes closely-related cases such as butterflies mimicry.

Project Overview

After extensive discussion and exploration, we have decided to study and develop both 1) explainable methods for black-boxed models and 2) interpretable models. Specifically, we aim to leverage counterfactual reasoning and generative models for explanation and develop novel interpretable models. We have successfully developed INterpretable TRansformer (INTR) that can reveal traits of animal species and rely on them for fine-grained recognition. We have also developed a prototype approach that can explain an image classifier using pre-defined rubrics such as traits or attributes.