Exercise 4: Model checking

Instructions:

Hints and reminders are italic

Questions appear in blue.

Needs to be completed and handed in by 21st February


Resources:


The challenge: How many police officers do you need?

You are part of the police headquarters team in Chicago. It is predicted to be a very cold weekend, with an average temperature of -10ÂșC. Obviously most of the police officers would like to make the most of the cold weather by going skiing and ice skating. But you also need to keep the public safe. Previous research has shown that approximately 2 police officers (they work in pairs) are needed for every 20 daily crimes. The Chief of Police has asked your team to provide a recommendation for how many officers they need for this cold weekend.

Luckily (as always) you already have data on temperature and crime numbers that you can use.

It is your job to find out how many police officers you would recommend to be on duty on Saturday.

Politi


The data from last week can be found at https://www.math.ntnu.no/emner/ST2304/2019v/Week5/NoFreezeTempCrimeChicago.csv

The first step is to import the data and assign it to an object. You can use the whole web link above to import the data. It is a csv file with column names (header) included.

The next step is to plot the data, its good to remind ourselves what it looked like. The line of the linear regression from last week is also included.

plot of chunk unnamed-chunk-2

1. Using the linear regression model from last week, predict the number of daily crimes for an average temperature of -10ÂșC. Report the answer with prediction interval

Hint1: you will need to use the predict function, create newdata, and predict with a prediction interval

Hint2: for the lm() you will need to use format lm(y ~ x, data = YourData)

2. Based on this result how many police officers would you recommend to be on duty? Explain reasons behind your answer. Include prediction intervals


You are speaking to a colleague at lunch, discussing this project (because you are very excited about it). Your colleague tells you they have some extra data. They have mean daily crime numbers from temperatures under 0ÂșC. This is just what you needed!

You can find the new complete dataset here:

https://www.math.ntnu.no/emner/ST2304/2019v/Week6/TempCrimeChicago.csv

It is important to import the data and plot it.

3. Fit a linear regression to the complete dataset including days <0ÂșC and plot the regression line

Remember: lm(y ~ x, data = YourData) and abline

4. Look at your plot of the data and regression line. What do you think of the fit? Are there any problems with using a straight line on this data? Do this by eye not with analyses

You have had a go at checking this model just by looking at the data and the regression line. But there are some more thorough ways we can explicitly check whether our model meets the assumptions of a linear regression.

The four graphs that statisticians typically use are called: Residuals vs fitted, Normal Q-Q, and Residuals vs leverage

5. Create a residuals vs fitted plot for the linear model on all the data. Interpret the plot in terms of model fit.

Think about which assumption this plot assesses, what you expect it to look like if the assumption is met, how does your data differ from this

# I have called my model model2, replace this with the name 
# of your model

# create a vector of rounded residuals
CrimeResiduals <- round(residuals(model2),2)

# create a vector of rounded fit
CrimeFitted <- round(fitted(model2),2)

# plot the fitted and residuals
plot(CrimeFitted, CrimeResiduals)
# add a horizontal line at 0
# line is grey and dashed (lty=2)
abline(h=0, lty=2, col="grey")

plot of chunk unnamed-chunk-9

6. Create a Normal Q-Q plot for the linear model on all the data. Interpret the plot in terms of model fit.

Again: Think about which assumption this plot assesses, what you expect it to look like if the assumption is met, how does your data differ from this

# use the residuals you calculated above 
# now create the Normal Q-Q plot 
qqnorm(CrimeResiduals)
# add the ideal line
qqline(CrimeResiduals)

plot of chunk unnamed-chunk-11

7. Interpret the Cook's distance plot below. What does it tell you that the Residuals vs fitted and Normal Q-Q plots did not?

# Then create the last plot for model checking
# this is done by using the plot function (which would
# create 4 plots of the model) and choosing just the
# last one (which=4). A bit of a cheat to a nice plot.
plot(model2, which=4)

plot of chunk unnamed-chunk-13

8. Based on your assessments of the model checking plots, how could you improve the model fit for these data? (suggest at least 2 things you could do - try not to cheat)

Once you have had a go a question 8 - click here to continue with the exercise:

https://www.math.ntnu.no/emner/ST2304/2019v/Week6/Exercise4_ctnd.html