What does AI regulation in policing look like?
While their have been calls for Artificial Intelligence (AI) regulation – see this Whitehouse press release for one example – details of what that means in practice are sparse.
Given my background of conducting research with police departments, as well as being a data scientist that has contributed to the creation and implementation of different “AI” systems, I figured putting my thoughts on what practical regulation might look like would be useful. For a brief outline of this post, I will discuss:
- Regulation should be focused on real harms that can come from explicit uses of the system
- Builders should have documentation to ensure the system meets certain requirements before being used
- Users should have a plan in place to monitor the system in practice and be able to audit ex-ante if necessary
- There needs to be an external body defining specific standards on a case by case basis and ensuring these standards are continually met
It will not be a simple checklist, and in practice may be more of a set of guiding principles. But I do think it is possible to create reasonable regulation that promotes safety in the application of AI, but is not too onerous for the parties involved to make it infeasible.
What is the point of regulation?
Before we start, it is important to articulate what the end goals of regulation are. Many types of regulations focus on safety; these may be safety of end consumers of products (FDA), or safety of employees (OSHA). I think a reasonable place to start with AI regulation in policing should be focused on articulable harms, broadly defined, to either the general public or police officers, in the standard operation of AI.
Why this is important is because the existence of an AI system does not intrinsically mean it can result in harm. It matters how the system will be used in practice.
Consider for example a person based predictive policing tool. One way it could be used is to place an individual under surveillance and open investigations into their potential criminal behavior. Another however could be the READI program in Chicago, in which individuals are assigned a case worker and prioritized other social services.
Systems that have more potential for harm should have stricter oversight into their operation. The former predictive policing scenario should have a higher standard to determine it is not causing harm in its application, whereas the latter it is difficult to articulate harm (I know you could expand harm to something like opportunity cost or distributive in-equity, but those are much more difficult to put on a finger on than systems that have explicit arrest based outcomes).
To be clear, this isn’t to say the first surveillance scenario should be banned entirely – I think chronic offender initiatives are a good idea for police departments overall. They should however be built with foresight on how they can be used (or abused), and have a plan to mitigate those harms from the start.
Another scenario that is important in understanding how the system will be used is the idea of scope creep. I believe automated license plate readers (ALPRs) are a prime example of this in policing. So ALPRs were initially created to flag when a car passed by that was stolen. A common use case though shortly after they were more broadly adopted was ex-ante searching for license plates. Personally I am worried about how the latter searching scenario can be subject to abuse (imagine a police officer stalking a person by monitoring where a car is going), which relies on cached data, whereas the flagging stolen vehicles you don’t need to cache non-hits at all.
This also highlights what I mean by harms broadly defined. The harm in this particular application is not related to the direct use case of ALPRs, but potential unintended harms. Having clear end use cases for a system in practice will be important for identifying guide-rails for the application of the system. In the ALPR scenario, for searching it should have an auditing system in place to prevent overly broad searches (like are in place for criminal history checks for states I am familiar with).
What is AI?
So far I have avoided defining AI – it is very tricky in practice. The way AI is commonly used now in general discussion or the media is often more akin to a marketing buzzword than any standard set of algorithms or applications.
For example, say I am a crime analyst and I run a report on street segments that had the most reported crime in the past three years. I then use those to suggest a hotspots policing strategy. Does that count as AI? If you think obviously not, well, the most popular spatial predictive policing systems are not all that different in practice. They use a model of prior crimes to predict the number of future crimes that will occur in some space-time window.
It is not feasible to expect a crime analyst meet a standard regulatory framework every time they run a simple report. Nor do I think every software tool that uses data and algorithms makes it intrinsically an AI tool. For an example, laypeople often confuse different network database tools as AI. Having a tool that tells an analyst “person A and person B have links to address Z” to me is personally not AI, but admit drawing a bright line is difficult.
In my best attempt to put some guide-rails on what we should be regulating though, here is my definition of AI systems:
Systems that use historical data to generate automated predictions, that are used repeatedly to make or aid actions the police department take.
In this definition I have totally avoided defining artificial intelligence at all, and focused more on how the system will be used in practice. I am not totally satisfied with this definition (I’m not sure I will ever be able to make one), but “automated”, “repeated” and “aid actions” to me are more important than the actual algorithmic details here. I don’t think all are necessary, but most examples that I believe regulation makes sense have at least one of those criteria.
So the network database that helps get links faster I don’t think is AI, it is not used in an automated fashion, it requires input from a crime analyst. Whereas ALPRs automatically monitor and flag license plates. (Although this is fuzzy, as the potential harms come from an analyst or officer querying the system.)
I think ultimately regulation will need a strong human component at all stages, which will include whether a particular application is in-scope or out of scope for regulation and oversight.
Before Regulation vs After Regulation
I gave two other regulatory bodies at the start of this post – OSHA and FDA. One large difference between these two agencies is the stage at which they have the majority of their oversight. FDA has a pre-approval process before they allow drugs to go to market, and OSHA is more focused on ongoing monitoring in-situ. AI regulation will need both.
Before being put into practice, I believe those building AI systems (which may either be software vendors or internally developed predictive systems) should have certain reporting standards. For a brief checklist, these include:
- what data was used to train the model (e.g. was it a random sample, or a stratified sample from some specific source)
- how was the model evaluated (e.g. train/test split, different model accuracy metrics)
- is the model updated on some standard schedule?
- how will the model be monitored to ensure its performance does not degrade over time
Having built and deployed many machine learn models in production, these are really minimally necessary to ensure a quality system, and do not give up substantial intellectual property. I intentionally don’t list details of the type of model employed (e.g. deep learning, boosted model, or some other system that has many components), as those to me start to infringe on intellectual property of the software owners. (The data used to build the models is often important as well, but I believe that is necessary to ensure whether the model makes sense for your use case.) It is easy to pile on more reporting for software vendors, regulations should keep in mind though they are not costless to those who try to meet the regulations.
In practice different end users may wish to see different model metrics, so I don’t believe you can have a simple list of “needs to meet X metric”. Give the READI example above, one city may be interested in ensuring that the predictive model is equitable across different racial categories, whereas another city may only be interested in overall false positive rates. I do not think it will be possible to have a standard checklist of model metrics that need to be met, but in practice these should likely not be too onerous for vendors to accommodate different metrics for individual use cases.
Having oversight after a model is put into production I believe is just as important (if not moreso) than the before regulation. Predictive models in practice are not “set it and forget it”, they should be consistently monitored and periodically updated to ensure the validity of the system. After regulation should include items like:
- continuous monitoring plan with regular reports
- plan to update model over time, either in an automated fashion or in response to model degradation
- ability to ex ante evaluate specific predictions, should need arise to root cause analyze problems
These have both components that the police department may take on internally, as well as software vendors. As a software developer, it is hard to build machine learning systems that an end user can come back and say “I think this prediction is weird, why did it predict X in this situation?”. (It requires caching multiple things, data/code/model-weights.) But I think that ability is crucial – these systems will make mistakes. The more complicated the system the easier it is to make a mistake (and not know it is a mistake until you get real user feedback). Having a system that allows ex ante auditing for particular egregious mistakes to me is necessary.
Who does the regulating?
So far I have discussed potential responsibilities that those building AI systems and the police will have – but it is just as important to identify who will be doing the regulation. Ensuring that these requirements are met should not solely be in the purview of the software vendors or police themselves, there should be some external oversight that these standards have been met.
While it is possible that a federal agency will take on that responsibility; in policing it may be some arm of the DOJ, or perhaps more broadly NIST, if I were a city I would not hold my breath on that. Because these systems are so myriad, and because individual cities may have different expectations, I am not sure federal oversight makes sense. It may be that local oversight is really the best course.
There are broadly two models in place currently in policing that I think are reasonable: one is specific civilian oversight boards, the other is relying on city employees or elected representatives. The most important component here is a mixture of people who actually know about how the systems will be used and how AI systems are built in practice.
Many software vendors in this space are, to pardon my language, full of shit. You need competent data scientists, as well as people familiar with policing and representing the community, to weigh in on systems that can cause them harm, either directly or indirectly. Because there won’t be a simple checklist for each system, the regulators will need a certain level of expertise to be able to work with vendors and police directly on what makes sense on a case by case basis.
The regulators need to have some teeth (some civilian review boards are perpetually ignored), and should not be captured by the thing they are regulating (e.g. they should not be employed directly by the police department). In practice though I think cities can come up with reasonable systems that provide some level of regular oversight that is reasonable for all parties involved.