Submitted by Edward Grierson on Thu, 11/09/2025 - 13:19
Machine learning tools have shown great promise in forecasting reoffending. However, the historical datasets on which they are trained can lead to perpetuation or exacerbation of existing societal biases.
In his new research paper, PhD student Jacob Verrey outlines steps to 'debias' crime prediction models using a proposed fairness scale. He then explores how these tools can be carefully deployed to reduce crime, as well as address structural bias in the justice system.
Predicting crime
Reoffending - also known as 'recidivism' - has a significant economic cost to both the public and police resourcing, as well as obvious emotional costs to the victims. If the criminal justice system can predict recidivism, it may be able to target resources to prevent these crimes and their serious costs.
Much of Verrey's work revolves around the development of forecasting models: using machine learning to predict who will and who will not commit a crime in a way that is both ethical and statistically sound.
The process begins by inputting datasets. Machines can use patterns in data to predict who will commit another crime, and law enforcement can launch interventions to prevent these crimes from occurring. Many areas of criminology, such as police misconduct and domestic homicide, already make heavy use of machine learning.
However, using machine learning models has disadvantages. Namely, machine learning models use historical patterns in data to predict the future. So, if the historical data contains societal biases, then machine learning models can recreate these biases in their predictions. These societal biases can include gender, racial, and social class differences in who ends up in prison.
The fairness scale
To alleviate the risk of machine learning models recreating these societal biases, Verrey created the ‘fairness scale’ for recidivism. For him, it represents an opportunity to bake many academic concepts of fairness into a machine learning model.
In other words, to debias a model, an individual must define what lack of bias – or what fairness looks like, and there is no uncontested definition of fairness. Defining fairness is especially controversial in criminal justice datasets. Namely, there are large differences in reoffending between sensitive groups, and it is unclear whether these differences are fair – that is, they reflect a genuine difference in crime - or whether they instead reflect societal bias.
For example, criminal justice datasets shows that some races are more likely to reoffend than others. In the PNC records that Verrey was working with, Black and White offenders are more likely to reoffend than Asian offenders. Does this difference reflect societal bias, in that the criminal justice system unfairly targets some races more than others? Or does this reflect a genuine difference in crime between these groups? It’s unclear to what extent these reoffending differences reflect societal bias. But if police do not know how biased these differences are, they also do not know to what extent we need to ‘debias’ our crime prediction model.
“The field of computer science and AI has a lot of fancy tools, algorithms, and jargon,” Verrey says. “The question is how we translate them to criminal justice. The fairness scale is a great example of this.”
This is the main contribution of the fairness scale. It proposes multiple definitions of what fairness – or lack of bias – looks like, and practitioners can choose which definition to bake into their crime prediction models. Some definitions, like statistical parity, mirror affirmative action in the United States in that the model gives disadvantages groups ‘bonus points’ when making crime predictions. Other definitions are less extreme. For example, ‘fairness through unawareness’ states that a model is fair if it does not explicitly use a sensitive trait such as race in its predictions. Other definitions equalize some aspect of performance. Ultimately, Verrey argues, the practitioners are responsible for deciding which fairness definitions are appropriate for their crime prediction model.
A key advantage of the fairness scale, Verrey argues, is that it provides multiple definitions of fairness. This is important because each definition contains political assumptions, as well as semantic and technical ones. So, when practitioners use these definitions to debias a model, they need to know what assumptions are embedded in it.
“A lot of the time, when you look at the reviewed studies of recidivism forecasting, you see that most of them didn’t debias their model,” he argues. “Even the very few studies that do, they often just use one definition and run with it. So maybe, before we debias models, we should discuss the potential trade-offs and political ramifications of different methods.”
Applying the fairness scale
To apply the fairness scale, Verrey first created crime prediction models using the Police National Computer (PNC), a national database of convicted offenders. Namely, he created two crime prediction models: one to predict general reoffending over a three-year period, another to predict violent reoffending over the same period. He then used the fairness scale to successfully bake each definition of fairness into both models.
After creating these debiased models, Verrey then illustrated how police can actually use these models to prevent reoffending. However, he did so in a way that was cautious, to ensure the societal benefits produced by these models outweigh their potential harm.
First, he argues that police can use these models to identify offenders at high risk of reoffending. Once these offenders are identified, police can send them a nudge – a text message – warning the offender to not offend again. Nudges are extremely low impact in that they do not majorly disrupt an offender’s life. Yet, they are powerful in that they have been shown to prevent future reoffending.
According to Verrey, the low impact of a nudge is an important safeguard against false positives. False positives are when the crime prediction model incorrectly labels a low-risk offender as high-risk, causing them to incorrectly receive a police intervention. Verrey’s models are configured to produce very few of these sorts of mistakes. When this mistake does occur, a nudge ensures that the worst thing that happens is that the mislabelled person receives a non-obligatory text message, thereby minimising the potential harm.
Second, Verrey showed how to deploy these models in a way that mitigates the risk of algorithmic bias. In other words, police may uncritically accept the output of a machine learning model.
“For example, an officer may say ‘John Doe may commit another crime because the computer says so’ instead of saying ‘John Doe may commit another crime because of X, Y, and Z risk factors’” Verrey explains.
To guard against this sort of algorithmic bias, Verrey suggests a deployment method where officers make their own predictions before seeing those made by the model. Research indicates that this process ensures the model's output is used to inform, not replace, an officer's final decision.
Third, Verrey proposes a randomised control trial to test the crime prediction model. He suggests police test the model on recently released offenders, comparing a treatment group that receives the machine-learning-based nudge with a control group that gets a business-as-usual approach. If the treatment group commits fewer crimes than the control group, it would be powerful evidence that the model works and can be deployed more widely.
Future hopes
Verrey hopes that the fairness scale, when applied practically, can improve public perceptions of the police.
“Police are accused of racism, sexism, and classism all the time,” he says. “So, if we’re able to show that we’re working to combat these biases, that would have a positive effect on public trust and police legitimacy.”
He emphasises the need for further research into the processes. One possible area for investigation would be using different techniques to explore whether multiple fairness definitions can be simultaneously baked into the crime prediction model. This would require different implementation techniques than the ones discussed in the paper. Yet, if one were to accomplish this, then a model could be considered even more fair or debiased.
Overall, however, he is optimistic that his proposed fairness scale can allow for more effective prediction of recidivism.
“One of the primary reasons I’m doing this research is because it’s edifying,” says Verrey, “I’ve been able to translate these ideas from computer science into the real world. And in doing so, my research could have a positive impact on the criminal justice system.”