Auditing Black Box Models

Models learned through machine learning can be hard to interpret. A model that takes many inputs and has many parameters might depend on the inputs in complicated ways. This makes it hard to know if, for example,

• the model might indirectly depend on protected attributes of the input that users of the model might prefer it not do (for example race via zipcode).

• the model depends heavily on input attributes that domain knowledge might suggest should not be an important factor (for example, a system that predicts the output of a chemical reaction but appears to be influenced by the time the reaction is run).

This tutorial will teach the audience to use a software library developed by the presenters of the tutorial through a sequence of simple examples in a Jupyter notebook. The presenters will focus on understanding current strengths and limitations of the method, and how tutorial attendees can use this method in their own datasets.

Materials

Papers

Suresh, Carlos, and Charlie are the ones presenting the tutorial; but the techniques themselves were designed in collaboration with many other people, and the work is described in these papers:

BlackBoxAuditing repository

The source code for the BlackBoxAuditing library itself is currently hosted on a GitHub repository.