[SOLVED] EMATM0061 Statistical Computing and Empirical Methods TB1 2024 Assignment 3 Processing | Assignment Chef (2024)

5/5 - (1 vote)

Assignment 3

EMATM0061: StatisticalComputing andEmpiricalMethods, TB1,2024

Introduction

Create an R Markdown for the assignment

First, it is recommended that youcreate asingleRMarkdowndocumenttoincludeyour solutions, with headingscreated by headingcodessuch as“##1.1(Q1)”,“##3(Q1)”,etc.

It is a good practice touseRMarkdown toorganiseyourcodeandresults.For example,youcanstartwiththetemplatecalledAssignment02_TEMPLATE.Rmdwhich can be downloaded viaBlackboard.

You canoptionally hand in this assignmentby13:00Monday7October.This will help us understand your work but will notcount towards yourfinalgrade.Ifyou want to hand in the assignment, pleaseuploadaPDFfilecontainingyouranswerstoBlackboards(click on the “Assignment 03” under theassignment tab).Thereisno requirement on how thePDF file isgenerated.Oneexampleis tochoosetheoutputof R-markdown asPDF(which may requireLaTex to beinstalledin yourcomputer).Another example is to choose ahtmloutputatR-markdownandconvertthehtml file into aPDF file.Ifyou have multiplePDFfiles,pleasecombinethemintoasinglePDF file before thesubmission.

Load packages

We need to load two packages, namely“Stat2Data”and“tidyverse”,before answering the questions.Ifthey haven’t been installedon yourcomputer,pleaseuse“install.packages()” toinstallthemfirst.

1. Load the tidyverse package:

library(tidyverse)

2. Load theStat2Data package and then thedatasetHawks:

library(Stat2Data)

data(“Hawks”)

1. Visualisation

This part is mainly about visualisation usingggplot2.ItcoversmainlyLecture7.We are going to use the “Hawks”datasetthatwasusedlastweek.

(Q1) After the dataset Stat2Data was loadedsucesfully,createasmallerdatasetwith the codebelow.

hawksSmall<-

drop_na(select(Hawks,Age,Day,Month,Year,CaptureTime,Species,Wing,Weight,Tail))

Use the dim() function to checkhowmanyrowsandhowmanycolumnsinhawksSmall. Then use the head() function todisplay thefirst5rowsofthehawksSmal. The result shouldlook likethis:

##

Age

Day

Month

Year

CaptureTime

Species

Wing

Weight

Tail

## 1

I

19

9

1992

13:30

RT

385

920

219

## 2

I

22

9

1992

10:30

RT

376

930

221

## 3

I

23

9

1992

12:45

RT

381

990

235

## 4

I

23

9

1992

10:50

CH

265

470

220

## 5

I

27

9

1992

11:15

SS

205

170

157

(Q2) Use acombination ofthe functions“ggplot()” and“geom_histogram”tocreate a histogram plot ofthe weights of Hawks withinthehawksSmalldataframewithbinwidths of 10. Your result should looksomethinglikethefollowing.

(Q3) Similar to(Q2),use the “geom_density()” function to createtwodensityplotsofthe tail lengths of Hawks within the hawksSmalldata frame,withparameters “adjust=0.5” and “adjust=2”, respectively. Your resultsshouldlook likethis:

Can you explain the difference between the twoplots?Howmanymodesdoesthedistributionof Hawk tail lengths have ineach ofthe plots.

(Q4) Basedon the answer from(Q3),can youcreate the followingplot?

(Q5) Violin plot:

Use the ggplot and geom_violin() functions tocreate thefollowingviolinplotforthethree species.

(Q6) Scatter plot

Generate a plot similar to the followingplotusingtheggplot()andgeom_point()functions.

1. How many aesthetics are present within thefollowingplot?

2.What are the glyphs within thisplot?

3. Whatare thevisualcuesbeingusedwithinthisplot?

(Q7) Trend lines andfacetwraps:

Generate the following plot using the ggplot(),geom_point(),geom_smooth()andfacet_wrap() functions.Note that in the facet plot, thethreepanelsusedifferentscales.

1. Whatare thevisualcuesbeingusedwithinthisplot?

2. Basedon the plot below, what can we sayabouttherelationshipbetweentheweight ofthe hawks and theirtaillengths?

(Q8) Adding annotations

First,compute the “Weight” and the “Tail”oftheheaviesthawkin thedataset.Youcan use filter() and select()function toselectproperdata.

Second, reuse the code that you createfromQ(3),addinganarrowandan annotation to indicate the heaviest hawk. Your resultshouldlooksimilartothis:

2. Finite probability spaces

Now, let’s move on to considerprobabilityonfinitesamplespaces,whichiscoveredinLecture8.

2.1 Sampling with replacement

Recall that for positive integers nand k, givesthenumberof subsetsofsize kfrom asetof nobjects.

You cancompute this number straightforwardly within“R” via thechoosefunction “choose()” .For example, ifwe want to compute thenumberof differentsubsetsofsize3 from a collection of size8 wewouldcompute:

choose(8,3)

##[1]56

Suppose we have a bag containing10spheres. Thisincludes3redspheresand7blue spheres.

Let’s suppose that we draw asphereatrandomfrom thebag(allsphereshaveequal probability ofbeing drawn). We record itscolourandthenreturnthespheretothebag. This process is repeated22 times.Thisisanexampleof sampling with replacement since the spheres are replacedaftereachdraw.

(Q1) Write down a mathematical expressionfortheprobabilitythatZoutofthe22selections were redspheres(here Z ∈{0,1, ⋯ ,22}).

You can try writing down your mathematicalexpressionusing“LaTex”codein yourR markdown file, making useoftheLaTex functions“binom{”{}}and“frac{”{}}.Forinformation about writing mathematical formulae inRmarkdowndocuments,findillustrative examples in “Assignment_R

MarkdownMathformulasandSymbolsExamples.rmd”(under the resource list tab onblackboardcourse webpage) or visit:

https://bookdown.org/yihui/bookdown/markdown-extensions-by- bookdown.html#equations

(Q2) Next write anR function called“prob_red_spheres()” which takesZasan argument andcomputes the probability that Zout of a totalofthe22 ballsselectedarered.

Test your function asfollows:

prob_red_spheres(10)

##[1] 0.05285129

(Q3) Generate a data frame. called“prob_by_num_reds” withtwocolumns “num_reds” and“prob”. The “num_reds”column shouldcontainnumbers1 through22 and the “prob”column should givetheassociatedprobabilityof selecting that many reds out of a totalnumberof 22selections.Display the first3 rowsofyourdataframe.:

prob_by_num_reds %>% head(3)

##num_redsprob

## 110.003686403

##22 0.016588812

##33 0.047396606

(Q4) Now use the “geom_line()” function within the“ggplot2”library,inconjunctionwith your data frame. todisplay a plotoftheprobabilityasafunctionofthenumberof reds.

Your plot should lookasfollows:

(Q5)

Next we shall explore the “sample()”function withinR.Let’ssuppose wewanttosimulate a random experiment in which wesamplewithreplacementfroma

collection of 10objects, and repeat thisprocess22times.

Wecando this bycalling:

sample(10, 22,replace=TRUE)

##[1]5 86104531651094576771015

5

Try this out for yourself. The outputshouldbea vectorof length22consisting entirely of numbers between1 and10.Since this issampling withreplacementsandthe number of samples exceeds the numberofelements, therewillberepetitions.

Try rerunning the function. You probably getadifferentsample.Thisis tobe expected, andeven desirable, since thefunction simulatesarandomsample.

However, the fact that we get a differentanswereverytimewerunthecodeis problematic from the perspective ofreproducibility. To avoidthisprocesswecanset a random seed via the functionset.seed().Bydoingsoweshouldgetthesameoutputeverytime. Trythefollowingoutforyourself:

## case 1: Setting the random seed just once

set.seed(0)

for(i in 1:5){

print(sample(100,5,replace=FALSE))

# The result may well differ every time

}

## case 2: Resetting the random seed every time

set.seed(1)

print(sample(100,5,replace=FALSE)) set.seed(1)

print(sample(100,5,replace=FALSE)) set.seed(1)

print(sample(100,5,replace=FALSE))

# The result should not change

## case 3: reproducing case 1 if we set a random seed at the beginning.

set.seed(0)

for(i in 1:5){

print(sample(100,5,replace=FALSE))

} # The result will be 5 samples exactly the same as in case 1 (why?).

Next, try to understand what the following codedoes

itermap <- function(.x,.f){result<- list()

for (item in .x){

result <- c(result, list(.f(item)))}

return(result)}

itermap( c(1,2,3), function(x){ return(c(x,x^2))})

##[[1]]

##[1]11##

##[[2]]

##[1]24##

##[[3]]

##[1]39

itermap_dbl <- function(.x,.f){result <- numeric(length(.x))

for (i in 1:length(.x)){

result[i] <- .f(.x[[i]])}

return(result)}

itermap_dbl( c(1,2,3), function(x){ return(x^3)})

##[1]1827

The function “itermap” and“itermap_dbl” definedabovearesimilar to the“map()”and“map_dbl” functions inR that will be introducednext week.

We shall now use the “sample()”toconstructasimulationstudytoexplorethe

probabilityof selecting z red balls from a bagof size10,with3redand7blueballs,when sampling22 balls with replacement.

First set a random seed.Thencreateadataframecalled

“sampling_with_replacement_simulation”consistingoftwocolumns. Thefirstis called“trial” andcontains numbers1 through1000. The secondiscalled

“sample_balls” andcorresponds to random samples of size22froma bagof size10,with replacement. We cando thisasfollows:

num_trials<-1000 # set the number of trials set.seed(0) # set the random seed

sampling_with_replacement_simulation<-data.frame(trial=1:num_trials) %>%

mutate(sample_balls = itermap(.x=trial, function(x){sample(10,22,replace =TRUE)}))

# generate collection of num_trials simulations

Now add a new columncalled“num_reds”such that,foreachrow,“num_reds”

contains an integer which gives the numberof items withinthesamplefor thatrow(i.e., the sample represented by theentry ofthe“sample_balls”column)whichare

less thanor equal to three(assuming that the threered ballsarelabelledby{1,2,3}).For example, suppose that for somerowofthedataframe,the“sample_balls”

column contains aentry which is thefollowinglist:

410410835555107121105657110

Then the corresponding row ofthe“num_reds”column shouldcontainthenumber5, since5 ofthese values are less thanequalto3.Youmaywanttousethefunctions “mutate()”, “itermap_dbl” and “sum()”. After performing this step,thedataframe.

“sampling_with_replacement_simulation” should contain a newcolumncalled “num_reds”.

(Q6)

Next we shall add a newcolumncalled“predicted_prob” toourexistingdataframe. “prob_by_num_reds” which gives the number of times (that weobserved the

corresponding number of reds within our simulation)divided bythetotalnumberof observations. This aims to estimate the probabilityoftheeventthatZoutof a

totalofthe22 balls selected are red(for Z = 1,2, ⋯,22). Wecando this asfollows:

num_reds_in_simulation<-sampling_with_replacement_simulation %>% pull(num_reds)

# we extract a vector corresponding to the number of reds in each trial

prob_by_num_reds<-prob_by_num_reds %>%

mutate(predicted_prob=itermap_dbl(.x=num_reds, function(.x)

sum(num_reds_in_simulation==.x))/num_trials)

# add a column which gives the number of trials with a given number of reds

Here “prob_by_num_reds” was initially defined in(Q3).

(Q7) Finally,create a plot which compares the resultsofyoursimulationwithyourprobability formula.

Your result should look something liketheplotbelow.Of course,since thisisarandom simulation, your resultmay well lookslightlydifferent.

prob_by_num_reds %>%

rename(TheoreticalProbability=prob,

EstimatedProbability=predicted_prob) %>%

ggplot() + geom_line(aes(x=num_reds, y=TheoreticalProbability)) +

geom_line(aes(x=num_reds, y=EstimatedProbability), linetype=’dashed’)

+

geom_point(aes(x=num_reds, y=EstimatedProbability)) +

theme_bw() + xlab(“Number of reds”) + ylab(“Probabilities”)

Try to make senseof each linein theabovecode.

Notice the close relationship between probabilitiesand long-runfrequencybehaviour. We shall explore this connectionover thenextfewlectures.

2.2 Sampling without replacement

This is a more challengingquestion,but theideasfromthelastquestionareusefulhere.

Let’s suppose we have a large bagcontaining100spheres.Thereare50redspheres,30 blue spheres and20 green spheres.Supposethatwesample10spheresfromthebag without replacement.

(Q1) What is the probability that oneormorecoloursaremissingfrom your

selection?First aim to answer this question viaasimulationstudy usingideasfromthe previousquestion.

You may want to use thefollowingsteps:

1. First setarandomseed;

2. Next set a numberoftrials(e.g.,10 or1000.Trythisinitially withasmallnumber of simulations.Increase your number of simulations to about a relatively large number to get amoreaccurateanswer,onceeverything seems to be working well), and asamplesize(10);

3. Now use a combination ofthefunctions“sample()”,“mutate()”and

“itermap()” to generate your samples.Here you are creatingasampleof size 10 from a collection of100 balls–thesamplingisdone without replacement;

4. Now compute the number of “reds”,“greens” and“blues”inyoursampleusing the “itermap_dbl()” and “mutate()”functions.

5. Computetheminimumofthethreecountsusingthe“pmin()”function.Whenthis minimum is zero, thenone ofthethreecoloursismissing.Itis

recommendedthatyoulookupthedifferencebetween pmin()”and“min()”here;

6. Computetheproportionof rowsfor whichtheminimumnumberofthethreecounts iszero.

(Q2) (*Optional) Thisquestion is optionalandmaybeomittedifyouareshortontime.

Let’s derive a mathematical expressionfor theprobabilitythatweconsideredin

(Q1): You can try and use “combinations”with(k(n))tocomputetheprobability

directly.First aim tocompute the number of subsetsof size10 from100 which

eitherentirely missout oneofthe subsetsRed={1, ⋯ ,50},Blues={51, ⋯ ,80},

Greens={81, ⋯,100}. Thencompute the numberof subsetsof size10 from100

which missout twoofthe subsetsRed,Blues,Green.Be careful nottodoublecount some ofthese subsets! Once you havecomputedallsuchsubsets,combinethemwith the formula for the total numberof subsetsof size10fromasetof100, to compute the probability of missing acolour.

Once you have the mathematical expression fortheprobability,youcancheckhowclose the probabilitycomputed in (Q1) to thetheoretical values.

[SOLVED] EMATM0061 Statistical Computing and Empirical Methods TB1 2024 Assignment 3 Processing | Assignment Chef (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Margart Wisoky

Last Updated:

Views: 5791

Rating: 4.8 / 5 (58 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Margart Wisoky

Birthday: 1993-05-13

Address: 2113 Abernathy Knoll, New Tamerafurt, CT 66893-2169

Phone: +25815234346805

Job: Central Developer

Hobby: Machining, Pottery, Rafting, Cosplaying, Jogging, Taekwondo, Scouting

Introduction: My name is Margart Wisoky, I am a gorgeous, shiny, successful, beautiful, adventurous, excited, pleasant person who loves writing and wants to share my knowledge and understanding with you.