Using Python to Derive Insights from 911 Calls Data

An exploratory analysis of the call records to 911 in Montgomery County in the United States. This is an attempt to analyze the data and derives meaningful insights using Python

Yãshñã Bëhërã
Geek Culture

--

Any emergency response system should be prepared for handling sudden or unexpected situations for eg., fire, crime, car crash, a medical emergency that requires immediate assistance from the police, fire department, or ambulance. It prevents fatalities and injuries, reduces damage to buildings and equipment, protects the environment and the community, and accelerates the resumption of normal operations.

911 is a hotline number in North America, only used for emergency situations. Whenever a call is made to 911, basic information is gathered like the location/street name of the emergency, the phone number, the nature of the emergency, and details about the emergency, such as a physical description of a person who may have committed a crime, a description of any fire that may be burning, or a description of injuries or symptoms being experienced by a person having a medical emergency.

We would be gathering the 911 calls dataset from Kaggle. This is an attempt to analyze a few of those calls made in the city of Montgomery. Montgomery County was founded on 6th September 1776 and was named after Major General Richard Montgomery. It is one of the most populated counties of the United States, according to the United States Census Bureau the estimated county’s population is 1,050,688 as of 2019. The data collected has 9 columns namely — Latitude, Longitude, Description, Zip Code, Title/Cause, Timestamp, Township, Address, and e along with 99492 entries. From this dataset, we will try to analyze the reasons and other dynamics like frequency and time frames during which the calls are being made.

We will start with importing necessary libraries like Pandas for basic data manipulation of our dataset along with Matplotlib and Seaborn for data visualization.

#import necessary librariesimport pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="darkgrid")
%matplotlib inline

Next step would be to upload our dataset. You can download the dataset from here and upload the same into Python as below.

#importing data from csv file
calls=pd.read_csv("GIVE THE PATH NAME WHERE YOU HAVE SAVED YOUR FILE")

A Brief Intro About The Dataset

After importing we will try to get some understanding of what we have on hand by checking out the types of data available in our dataset using .info() function.

#checking data types
calls.info()

From the above picture we can see that there are 99492 rows and 9 columns in which there are 3 float type, 5 object type and 1 integer type data. Columns like longitude, latitude, Town, zip and address tell us about the location of the incident/situation where as columns like description and title tell us about the nature of the situation. We can observe that there are some missing values in the above dataset, this can be checked by using .isnull() function.

#checking for null values
calls.isnull().sum()

In the above block we can see that major portion of zip code has null values along with township name and address. Null value means that some data value does not exist or are undefined in the dataset because of which we do not get a complete picture of our dataset and it might create a problem in future. To avoid it we’ll be filling up the null values by using forward filling (.ffill()) function. Forward Fill function is used to fill the missing value in the dataframe which will multiply the last valid observation forward.

#filling up the null values
columns=["zip","twp","addr"]
calls.loc[:, columns]=calls.loc[:, columns].ffill()
calls.info()

We can further modify any column as per our requirements for e.g., title and timeStamsp columns. The title column has all the causes for 911 calls. We are going to categorize all the calls into three broad reasons — EMS, Fire and Traffic.

#grouping all tiltles under 3 main reasons EMS, Fire, Traffic
calls['Reason'] = calls['title'].apply(lambda title: title.split(':')[0])
calls.head()

Making sure that the timeStamp column is in datetime format, we convert it using pd.to_datetime() function and then divide timeStamp column into Month, Hours, Day of Week, Date and Year columns.

#checking type of timeStamp column
type(calls['timeStamp'])
Out[]:pandas.core.series.Series#converting series to datetiem format
calls["timeStamp"]=pd.to_datetime(calls["timeStamp"])
time=calls["timeStamp"].iloc[0]
#Dividing timeStamp column into hours, month, week, year and date
calls["Hours"]=calls["timeStamp"].apply(lambda time : time.hour)
calls["Month"]=calls["timeStamp"].apply(lambda time : time.month)
calls["Day of Week"]=calls["timeStamp"].apply(lambda time : time.dayofweek)
calls['Date']=calls['timeStamp'].apply(lambda time: time.date())
calls["Year"]=calls['timeStamp'].apply(lambda time: time.year)
calls.head()

Next we will be dropping latitude, longitude, description, e and timeStamp columns as we will not require it for our analysis.

#droping columns that are not required
calls.drop(["lat","lng","e","desc","timeStamp"],axis=1,inplace=True)

As our next step, given that 2015 has only 3 months’ entries in the dataset, we will drop all entries for 2015 and analyze only the calls made in the year 2016.

#droping all the rows from year 2015
calls.drop(calls.loc[calls['Year']==2015].index, inplace=True)

Analysis of Dataset

After cleaning and removing all unnecessary data, now we’ll be analyzing the dataset starting with identifying the top 10 causes for dialing 911 maximum number of times.

Top 10 causes for calling 911

#top 10 causes
calls["title"].value_counts()[:10]

Top 10 Towns

#Top 10 townships
calls["twp"].value_counts()[:10]

Top 5 emergency situation for Top 3 Towns

town1=calls[calls["twp"]=="LOWER MERION"]["title"].value_counts().head()
town1
town2=calls[calls["twp"]=="ABINGTON"]["title"].value_counts().head()
town2
town3=calls[calls["twp"]=="NORRISTOWN"]["title"].value_counts().head()
town3

From the above few blocks, we can infer that the most common reason for calling 911 were vehicle accident and the towns with maximum number of calls were LOWER MERION, ABINGTON and NORRISTOWN. From this we can conclude that there are a lot of traffic related issues in these towns.

Now we are going to represent the main reasons for dialing 911 graphically. We can see that mostly people are calling 911 for EMS reasons followed by issues caused due to traffic and fire.

#top most reason to call 911
sns.countplot(x='Reason',data=calls,palette='viridis')
plt.style.use('dark_background')

In our dataset, day of week and month are in numerical format, we’ll be converting them into their string week name and month name. First we’ll make two dictionaries for week and month then we’ll map them using .map() function.

#converting integer value into its actual string names to the day of the week and month using .map() function
weekmap={0:"Mon", 1:"Tue", 2:"Wed", 3:"Thur", 4:"Fri", 5:"Sat", 6:"Sun"}
calls["Day of Week"]=calls["Day of Week"].map(weekmap)
monthmap={1:"Jan", 2:"Feb", 3:"Mar", 4:"Apr", 5:"May", 6:"Jun", 7:"Jul", 8:"Aug", 9:"Sep",10:"Oct", 11:"Nov", 12:"Dec"}
calls["Month"]=calls["Month"].map(monthmap)
calls.head()

Next we’ll see the frequency of calls per month, day of week and hour

#Frequency of calls per month
months = ['Jan', 'Feb', 'Mar', 'Apr','May','Jun', 'Jul', 'Aug','Sep', 'Oct', 'Nov', 'Dec']
calls["Month"].value_counts().reindex(months).plot(kind="bar")
plt.style.use('dark_background')
plt.xlabel('Month')
plt.ylabel('Frequency')
plt.title('Frequency of calls per month')
#Frequency of calls per week
weeks = ["Mon","Tue","Wed","Thur","Fri","Sat","Sun"]
calls["Day of Week"].value_counts().reindex(weeks).plot(kind="bar")
plt.xlabel('Day of the Week')
plt.ylabel('Frequency')
plt.title('Frequency of calls per week')
plt.style.use('dark_background')
#hourly distribution of calls
plt.style.use('dark_background')
calls['Hours'].hist(bins=30)
plt.xlabel('Hours')
plt.ylabel('Frequency')
plt.title('Frequency of calls per hour')

From the above two graphs we can observe that the maximum number of calls were made during the week days instead of weekends between 15:00 hrs to 17:00 hrs. Within all the months, there is an increase of calls in January and July whereas, there is a massive drop in calls after July.

Count of EMS, Fire and Traffic per day of week, month and hour

#Count of EMS, Fire and Traffic week wise
fig_dims = (12, 5)
fig = plt.subplots(figsize=fig_dims)
sns.countplot(x='Day of Week',data=calls,hue='Reason',palette='Dark2')
plt.legend(bbox_to_anchor=(1, 1), loc=2, borderaxespad=0.5)
plt.style.use('dark_background')
plt.title('Count of each reason per week')
#count of EMS, Fire and Traffic month wise
fig_dims = (12, 5)
fig = plt.subplots(figsize=fig_dims)
sns.countplot(x='Month',data=calls,hue='Reason',palette='Dark2')
plt.legend(bbox_to_anchor=(1, 1), loc=2, borderaxespad=0.5)
plt.style.use('dark_background')
plt.title('Count of each reason per month')
#count of EMS, Fire and Traffic hour wise
fig_dims = (14, 5)
fig = plt.subplots(figsize=fig_dims)
sns.countplot(x='Hours',data=calls,hue='Reason',palette='Dark2')
plt.legend(bbox_to_anchor=(1, 1), loc=2, borderaxespad=0.5)
plt.style.use('dark_background')
plt.title('Count of each reason per hour')

Frequency of fire calls remains constant through the week where as calls for EMS and traffic is more on Mondays and Wednesdays.

Maximum number of calls for fire and emergency were made in the month of July and for traffic were made in January.

We saw that a high number of emergency calls were being made at 12 noon, whereas, for fire and traffic the frequency was highest in the evening between 5pm to 6pm.

Now we will try to find out if there is any relation between month, day of week and hour for the frequency of calls.

Month v/s Hour

#grouping number of calls month/hour
monthhour=calls.groupby(["Month","Hours"]).count()["Reason"].unstack().reindex(months)
plt.figure(figsize=(15,6))
sns.heatmap(monthhour,cmap='viridis',linewidths=1)
plt.style.use('dark_background')

From the above heatmap we can infer that maximum number of calls were made from January to July between 12 noon to 17:00 hrs.

Week v/s Hour

weekhour=calls.groupby(["Day of Week","Hours"]).count()["Reason"].unstack().reindex(weeks)
plt.figure(figsize=(15,6))
sns.heatmap(weekhour,cmap='viridis',linewidths=1)
plt.style.use('dark_background')

From the above heatmap we can analyze that weekdays receive more number of calls than weekends between 12 noon to 17:00 hrs.

Week v/s Month

#grouping count of calls week/month
wm1=calls.groupby(["Day of Week","Month"]).count()["Reason"].unstack().reindex(weeks)
wm2=wm.transpose().reindex(months)
weekmonth=wm2.transpose()
plt.figure(figsize=(15,6))
sns.heatmap(weekmonth,cmap='viridis',linewidths=1)
plt.style.use('dark_background')

From the above heatmap we can observe that Friday has received maximum number of calls in the month of January and July.

Now we’ll be exploring the total number of calls made for each EMS, Fire and Traffic reasons month wise.

#count of each EMS calls month wise
EMS=calls[calls.Reason.isin(["EMS"])]
pvt3=pd.pivot_table(EMS,index="title",columns="Month",values="Date",aggfunc=len,fill_value=0)
pvt3["Total"]=round(pvt3.sum(numeric_only=True, axis=1),2)
df3=pvt3.sort_values(by="Total", ascending=False).head(5)
eg=df3.drop("Total",axis=1).transpose().reindex(months)
eg.plot.line()
plt.gcf().set_size_inches(15,7)
plt.title('Frequency of Top 5 EMS Situations Monthly')
#count of each Fire calls month wise
Fire=calls[calls.Reason.isin(["Fire"])]
pvt4=pd.pivot_table(Fire,index="title",columns="Month",values="Date",aggfunc=len,fill_value=0)
pvt4["Total"]=round(pvt4.sum(numeric_only=True, axis=1),2)
df4=pvt4.sort_values(by="Total", ascending=False).head(5)
fg=df4.drop("Total",axis=1).transpose().reindex(months)
fg.plot.line()
plt.gcf().set_size_inches(15,7)
plt.title('Frequency of Top 5 Fire Situations Monthly')
#count of each Traffic calls month wise
Traffic=calls[calls.Reason.isin(["Traffic"])]
pvt5=pd.pivot_table(Traffic,index="title",columns="Month",values="Date",aggfunc=len,fill_value=0)
pvt5["Total"]=round(pvt5.sum(numeric_only=True, axis=1),2)
df5=pvt5.sort_values(by="Total", ascending=False).head(5)
tg=df5.drop("Total",axis=1).transpose().reindex(months)
tg.plot.line()
plt.gcf().set_size_inches(15,7)
plt.title('Frequency of Top 5 Traffic Situations Monthly')

Frequency of Top 5 EMS Situations Monthly

Frequency of Top 5 Fire Situations Monthly

Frequency of Top 5 Traffic Situations Monthly

Conclusion

  • For the time period we analyzed, the residents of Montgomery made most 911 calls for emergency reasons (EMS).
  • Top 3 Towns with highest number of calls were LOWER MERION, ABINGTON and NORRISTOWN
  • Overall maximum number of calls were made during the week days between 15:00 hrs to 17:00 hrs. In terms of months, the frequency of calls went up in January and July.
  • Frequency of fire calls remains constant through out the week
  • Maximum number of calls for fire and EMS were made in the month of July and for traffic were made in January
  • Frequency of emergency calls was highest on Monday
  • Most of the Traffic calls were made on Wednesday
  • Top 5 EMS situations are RESPIRATORY EMERGENCY, CARDIAC EMERGENCY, FALL VICTIM, VEHICLE ACCIDENT and SUBJECT IN PAIN
  • Top 5 fire situations are ALARM, VEHICLE ACCIDENT, FIRE INVESTIGATION, GAS-ODOR/LEAK and BUILDING FIRE
  • Top 5 traffic situations are VEHICLE ACCIDENT, DISABLED VEHICLE, ROAD OBSTRUCTION, HAZARDOUS ROAD CONDITIONS and VEHICLE FIRE.

For further references on code, click here

--

--