A Q&A with Discussion Meeting author Kosuke Imai

Kosuke Imai is Professor of Government and of Statistics at Harvard University. He is one of the authors of the paper being read before the Royal Statistical Society at its next Discussion Meeting: 'Experimental evaluation of algorithm-assisted human decision-making: Application to pretrial public safety assessment'.

In this Q&A, he talks about the paper, which develops a methodology for evaluating algorithmic recommendations and applies it in a legal context.


If algorithms replace human judges, who are going to choose algorithms? Who should be held accountable for 'wrong' decisions made by algorithms? If we go down this path, our society will need a new system to deal with these potential issues.

This is interdisciplinary work. How did you manage to find a common language and explain statistical concepts to your law scholar colleagues?
My law school colleague, Jim Greiner, has a PhD in statistics (along with J.D. of course)!  In fact, we’ve known each other since when we were together in the graduate school. In the past, I have collaborated with scholars who have little statistics background, but with Jim, we have no problem communicating statistical concepts.

Do you think investigators/judges/juries are willing to accept algorithms to assist in their decision making?  Do you think that algorithms could eventually substitute judges/juries in court rulings?
What surprised me when I started this project is that at least in the United States, algorithmic recommendations have been used to assist judicial decisions for a relatively long time. Some judges say that they do not think algorithms are helpful, but it is an empirical question whether algorithmic recommendations affect their decisions. It is possible that judges are unconsciously influenced by these recommendations. This is why, I think, experimental studies like ours are useful. We can empirically evaluate whether algorithms influence human decisions.

The question of whether algorithms replace human judges in courts is an interesting one. One problem is accountability, political and legal. In many democratic societies, judges are either elected or politically appointed. Judges themselves or politicians who appoint them can be held accountable through elections or other democratic mechanisms. If algorithms replace human judges, who are going to choose algorithms? Who should be held accountable for 'wrong' decisions made by algorithms? If we go down this path, our society will need a new system to deal with these potential issues.

What are your thoughts on the transparency of algorithms used in criminal justice systems and the need for them to be validated and tested for reliability? How do you think this can be accomplished?
In public policy settings, including criminal justice systems, transparency of algorithms is essential. All inputs used for algorithms as well as the details of algorithms themselves should be publicly disclosed. The pre-trial risk assessment instrument we studied in our paper is also transparent. The exact calculation of risk assessment scores is public information. 

This openness facilitates the studies like ours. In fact, transparency is a major strength of algorithmic recommendations. We know what inputs are used for algorithms, meaning that we know exactly what factors are considered. This level of transparency is difficult to achieve for human decision makers. Algorithms may be biased, but it might be much easier to figure out why they are biased and how to fix them so long as the transparency of algorithms is guaranteed. In contrast, fixing the bias of human decisions is probably much more difficult. An entire subfield of psychology has devoted to studying biases of human decision making.

A major aspect of your evaluation related to the fairness of the judge’s decisions.  Can you describe what you mean by ‘fairness’ in this context?
Fairness is a difficult concept, and there are many competing definitions in the literature. There is no one best definition out there. In this paper, however, we consider a new notion of fairness, which brings in the ideas from the field of causal inference. We call a decision fair if those who are similarly affected by the decision receive a similar decision. In our study, a judge’s bail decision can affect different arrestees differently. For example, an arrestee may be able to pay a bail and get released but then commit a new crime while waiting for a trial.  Another arrestee may find it difficult to pay a bail of the same amount and as a result may get incarcerated before appearing in court. Fair decisions need to account for the way in which a decision can affect people.

The basic idea of our fairness notion is that if you and I are affected similarly by the decision, then a fair decision-maker should not treat us differently because of our protected attributes such as gender or race. We named this fairness concept as 'principal fairness' because it is based on the notation of principal stratification in causal inference: conditional on potential outcomes, a fair decision should not depend on protected attributes.

In many experimental situations, such as randomised clinical trials in medicine, people can be recruited to the study only if they have given their informed consent. Did the people coming before the judge in your experiment provide informed consent?
In this case, the risk assessment instrument was already in place. The jurisdiction in question was using it.  They were interested in getting our help on evaluating the bias and effectiveness of algorithmic recommendation. In other words, we were evaluating an existing policy rather than a new one. For example, we were not able to test new algorithms in our study.

What distinguishes your approach to evaluating the impact of algorithmic recommendations from previously used methods?
The main contribution of our approach is that we evaluate how algorithmic recommendations affect human decisions. Many prior studies examined the bias and accuracy of algorithmic recommendations themselves. However, in many settings including criminal justice systems, humans make final decisions. Therefore, we believe that it is equally important to study how algorithmic recommendations influence the bias and accuracy of human decisions. It is entirely possible, for example, that the biases of algorithms and humans cancel out or amplify one another. We believe that our experimental evaluation framework can be deployed in a variety of settings where algorithmic recommendations are used by human decision makers.

Your methodology is intended to be general. But how general can it really be, when algorithms can be used in such a wide range of contexts? 
Our methodological framework is very general, and in principle can be applied to many different settings if one is interested in studying how algorithmic recommendations affect human decisions.

Of course, every application is unique, and researchers need to consider various factors that are at play. In our application, one aspect we hope to study is the dynamic interaction between algorithmic recommendations, judges, and arrestees. It would be interesting to study the repeated interactions between algorithms, judges, and arrestees to understand the long-term impacts of algorithm-assisted human decision making on our society. There can also be additional methodological complications such as missing data and measurement error problems, too. Any good applied statisticians would have to adopt their analyses to address unique challenges.

This is why I love applied statistics – it’s a combination of developing a general methodology and applying it in a tailored fashion to real-world problems.

Read more about the paper and download the preprint
Register for the online Discussion Meeting
Register for the pre-meeting DeMO

Load more