Excerise 8: Dinosaur park

Instructions:

This document contains information, questions, R code, and plots.

Hints and reminders are bold

Questions appear in blue.

Needs to be completed and handed in by 24th March 23:59


Rationale:

This week we want you to make some decisions on when to do each analysis. Therefore, we have included more detailed instructions in separate HTMLs with code not here in the exercise.

Resources:

R this week:

All R parts explained in extra code htmls.

Might need to install bestglm package.


The challenge: How can we create the best dinosaur exhibit at a new zoo?

You are the board of directors of a new zoo opening in Norway. You want your zoo to be both exciting and educational, to teach visitors about all different kinds of plants and animals from throughout time. You have been very excited by new advances in cloning technology that you saw in Jurassic Park. You have set up a team of biologists to try and clone some dinosaurs to complete your “Ancient History” exhibit. They have been trialling different cloning techniques to try and work out the best protocol.

You have also set up another team to investigate public opinion of dinosaurs. It is very expensive to clone and to keep dinosaurs so you want to make sure that you are investing in the right ones.

Both teams have sent their results back you. Now you must analyse the data and decide on how to set up your dinosaur exhibit.

Your job is to find out how to use resources most efficiently to create an exciting dinosaur exhibit.

Dino


Part A: Setting aims

This part is a bit different to previous exercises. We realise that we have been giving you feedback and solutions without asking you what is most useful! This week we will take the chance to ask for your engagement with feedback and improvement. We will begin and end with questions related to your group's development and to ask what feedback would help you reach the level you want to get to.

A1. Go back through the exercises you have had corrected by the teaching assistants and look at the solutions (they are on the webpage). Based on looking at these, choose (and write down) two things you want to work on improving for this week's exercise.

We've chosen our goals, now let us do the exercise!


Part B: Concepts

B1. Why do we have model selection in statistics?

B2. What are two of the different aims of model selection?

B3. How do you perform model selection for each of these? i.e. name the technique don't give all the details


Part C: Which variables influence cloning success?

These data have been collected by your team of scientists in the cloning facility. They have been trying to clone several different species of dinosaur using fossils of different ages and different lab procedures. They have recorded the 'success' of the cloning as an index created from the number of viable embryos created, longevity of embryos, and the cost of the cloning method. The index has positive and negative values, positive indicating greater success from the investment. This is called SuccessIndex in the data and is the response variable.

The explanatory variables they collected are: Age this is age of the fossil being cloned in million years, Size this is the average adult body weight of the dinosaur species being cloned in metric tons, Herbivore this is an indicator of whether the species is a herbivore (TRUE) or a carnivore (FALSE).

Think about what kind of data (continuous or categorical) each of these are. It will help you with interpreting.

It is thought that some of these variables might explain the variation in cloning success index. But it is not yet known which.

The dataset for this questions can be found at https://www.math.ntnu.no/emner/ST2304/2020v/Week10/CloneData_2020v.csv

As always, the first step is to import the data and assign it to an object. You can use the whole web link above to import the data. It is a csv file with column names (header) included.

C1. Look at the question at the start of this section (in the title of Part C). Is this question confirmatory or exploratory? Why?

Based on your answer to question C1, open the appropriate help HTML for this section.

C2. Conduct model selection for answering “which variables influence cloning success?” To answer this question include a bullet point list of the steps you take to do this. You can include a line or two of R code with each bullet point but you should not need a lot.

C3. Interpret the results from the model selection. Include reference to model selection and the final model you end up with. I.e. you should also mention what the effect any variables have


Part D: Does the size of a dinosaur affect their popularity?

These data were collected from a large survey of the general public. The participants were asked to rate, on a continuous scale (0-100), how much they liked different dinosaur species. The species all differed in size. The board members (your team) think that visitors to your zoo will be more excited to see bigger dinosaurs because bigger dinosaurs are more popular.

The dataset has columns: PopularityScore the popularity score of the dinosaur species, Weight weight of the species in metric tons (a measure of size).

The data for this question can be found at https://www.math.ntnu.no/emner/ST2304/2020v/Week10/DinoData_2020v.csv

As always, the first step is to import the data and assign it to an object. You can use the whole web link above to import the data. It is a csv file with column names (header) included.

D1. Look at the question at the start of this section (in the title for Part D). Is this question confirmatory or exploratory? Why?

D2. Conduct model selection for answering “does the size (weight) of a dinosaur affect its popularity?” To answer this question include a bullet point list of the steps you take to do this. You can include a line or two of R code with each bullet point but you should not need a lot.

D3. Interpret the results from the model selection. Include reference to model selection and the final model you end up with. I.e. you should also mention what the effect of any variables are


Part E: Recommendation

E1. Based on all of your results, what would you recommend as a way to create an efficient and exciting exhibit?


Part F: Reflection and feedback

F1. Look back at your answer to A1. How well do you think you did at improving the areas you chose to work on? Give a sentence or two for each.

F2. What would you like feedback on this week?

This question (F2) might be hard to answer, but it will be really helpful if you do. It might be that you want feedback on how well you improved the areas you mentioned in A1. On the other hand, you might be very confident that your work on those parts was good and you might prefer feedback on what other areas you can improve in future. You may want feedback on interpretation, or concepts, or R code, or on everything. It is up to your group what will be most helpful! You will still get a solution to this exercise so you can see a model answer. Therefore, choose feedback that will help beyond the solution.

Feedback should be to help you improve, so let us know what will be most helpful.