Why crime analysts should learn SQL

Most analysts I know use tools, like Crystal Reports, to export data from their record management systems (RMS) and conduct analysis in Excel. There is a better way – using SQL directly to both extract data and conduct various analyses.

SQL stands for Structured Query Language. It is a standard for interacting with relational databases, which all RMS under the hood are a set of tables that are related to one another. So for example, you may have a Crystal Report that you use to pull all of the robberies, export to a CSV file, and then make a line graph of the robberies per month for a CompStat like report.

This process involves three steps; click the buttons to export the data from Crystal Reports, open the exported file with Excel, make a pivot table of the counts by month, and then create the graph. This is not per se a bad process, but I prefer the approach using SQL directly for several different reasons:

For the first item, I have seen several systems that have close to 100 saved reports, all for minor differences. This becomes unmanageable, and likely people recreate the same reports several times over. For the second, when there are errors in the analysis, when you are doing manual work (like point and clicking in an application or copy-pasting in a spreadsheet), it becomes impossible to track down errors. Did you select the wrong Crystal Report? Did you accidentally delete rows in the Excel file?

So how might a workflow work in SQL? To get the aggregate number of robberies per month may be as simple in your RMS as:

-- SQL code to get counts of robberies per month
SELECT 
  YEAR(report_date), 
  MONTH(report_date), 
  COUNT(*)
FROM Incidents
WHERE NIBRS = '120'
GROUP BY YEAR(report_date), MONTH(report_date)

This to me is simpler than using Crystal Reports to pull data and make new reports. So do you want a different NIBRS category? Just change the WHERE clause. Do you want the individual incidents to say put in a map? That may look like:

-- SQL code to get robbery incidents with lat/lon
SELECT
  incidentid,
  report_date,
  lat,
  lon
FROM Incidents
WHERE NIBRS = '120'

Once you know SQL, which is not difficult to learn, you will be able to create new queries every time you need slightly different sets of data, or different types of aggregations. Then you can save the queries, so it is easier to do the downstream error analysis.

Now this did not discuss how to make the chart in the original goal, this will just produce the table. My favorite way to run SQL is via python. So for example, a python script to get the data and make a graph may look like:

# python code for running SQL query and making graph
import pandas as pd
import pyodbc
import matplotlib.pyplot as plt

# setting the connection and querying the data
conn = pyodbc.connect(...)
sql_query = """...your query here"""
data = pd.read_sql(sql_query,conn)

# making the plot
fig, ax = plt.subplots()
ax.plot(data['Date'],data['Count'])
fig.savefig('RobberyChart.png')

This is simplified, as it foregoes needing to create the connection string (see this blog post on figuring that out for your own system). You also may need to do some data manipulation in python post the SQL query (such as creating a date field that is the beginning of the month). But in the end your python script to accomplish the task of querying robberies and creating a graph will be quite short.

For those interested in learning more SQL, I have a few additional resources:

To end the post, crime analysts should learn SQL because it is a skill that is not only useful for crime analysis, but is a standard that almost all analysts in all fields know. Take it as an opportunity to level up, and make your skills more transferable to other positions in the future.