--- title: "Labwork 3. Hypothesis Testing. Analysis of Variance" author: "Kristupas Paulauskas, F5545" output: html_document: df_print: paged editor_options: markdown: wrap: 72 --- ```{r setup, include=FALSE} ## other output files: html_notebook for notebook ; html_document for html file; pdf_document for PDF file ``` # DESCRIPTION Objective: To develop the ability to apply hypothesis testing methods and analysis of variance (ANOVA) to evaluate statistical differences between data groups and draw evidence-based conclusions. # 1. Goodness-of-fit distributional testing ### TASK: Apply Goodness-of-fit tests to check whether the sample X follows some specific distribution. - Import data from Moodle, file Lab3_Dataset1.csv.
- Your data set - the n column from imported file, where n - is the last digit of your student's vidko (ID). ```{r,message=FALSE, warning=FALSE} # vidko (ID) = F5545 # data import DataSet1 <- read.csv("Lab3_DataSet1.csv", header = TRUE) ``` ## 1.1 Testing for Normal distribution (2 points) - Draw histogram of imported data sample and add the curve of normal distribution density dnorm().
```{r message=FALSE, warning=FALSE} X <- DataSet1[[5]] X <- X[is.finite(X)] m <- mean(X) s <- sd(X) hist(X, freq = FALSE, col = "skyblue", border = "white", main = "Histogram with Normal Curve", xlab = "X") curve(dnorm(x, mean = m, sd = s), add = TRUE, lwd = 2, col = "red") ``` - Determine mean and standard deviation.
```{r message=FALSE, warning=FALSE} cat("Mean =", m, "\nSD =", s, "\n") ``` - Set significance level alpha=0.05
```{r} alpha <- 0.05 ``` - Testing for normality:
-- if vidko (ID) 1-5, then apply Anderson-Darling Test (goftest::ad.test), Shapiro-Wilk Test (shapiro.test) and Pearson's Chi-squared Test (chisq.test).
-- if vidko (ID) 6-0, then apply Kolmogorov-Smirnov Test (ks.test), Shapiro-Wilk Test (shapiro.test) and Pearson's Chi-squared Test (chisq.test)
```{r message=FALSE, warning=FALSE} Z <- (X - m)/s ad_res <- goftest::ad.test(Z, null="pnorm") sw_res <- shapiro.test(X) h <- hist(X, plot = FALSE) breaks_cdf <- pnorm(h$breaks, mean=m, sd=s) null_probs <- zoo::rollapply(breaks_cdf, 2, function(x) x[2]-x[1]) chi_res <- chisq.test(h$counts, p=null_probs, rescale.p=TRUE, simulate.p.value=TRUE) ad_res sw_res chi_res ``` - Answer the questions below.
```{r,message=FALSE, warning=FALSE} # ad.test and ks.test requires data to be standardized ## Chi-squared test template (below) # intervals <- hist(data) # hist determines bins and count the number of observations in the bins # breaks_cdf <- pnorm(intervals$breaks, mean=mean(data), sd=sd(data)) # compute theoretical probabilities of normal distribution # null.probs <- zoo::rollapply(breaks_cdf, 2, function(x) x[2]-x[1]) # chisq.test(intervals$counts, p=null.probs, rescale.p=TRUE, simulate.p.value = TRUE) # Chi2 test; simulate.p.value helps to achieve better approximation ; rescale.p = probabilities should sum up to 1 # qchisq(alpha,length(intervals$breaks)-2-1,lower.tail=FALSE) # critical value of chi2 distribution ``` [Formulate the hypothesis for Anderson-Darling Test / Kolmogorov-Smirnov Test.
- H0: The sample comes from a normal distribution.
-Ha: The sample does not come from a normal distribution.
-Conclusion from testing:
If p-value \> 0.05 -> fail to reject H0 -\> the data do not contradict normality.
If p-value <= 0.05 -> reject H0 -\> the data are not normally distributed.
]{style="color:blue"} [Formulate the hypothesis for Shapiro-Wilk Test.
- H0: The data follow a normal distribution.
- Ha: The data do not follow a normal distribution.
- Conclusion from testing:
If p-value \> 0.05 -\> fail to reject H0 -\> the data are consistent with normality.
If p-value ≤ 0.05 -\> reject H0 -\> the data are not normally distributed.
]{style="color:blue"} [Formulate the the hypothesis for Pearson's Chi-squared Test.
- H0: The data follow a normal distribution with parameters μ and σ estimated from the sample.
- Ha: The data do not follow this normal distribution.
- How many bins you got?]{style="color:blue"} ```{r message=FALSE, warning=FALSE} k <- length(h$counts) k ``` [- What is the meaning of pnorm()?
is the cumulative distribution function (CDF) of the normal distribution.
- Conclusion from testing:
If p-value \> 0.05 -\> fail to reject H0 -\> the histogram frequencies match a normal distribution.
If p-value ≤ 0.05 -\> reject H0 -\> the distribution differs from a normal distribution.
- Find quantile of Chi-squared distribution qchisq() and make conclusion from the hypothesis comparing test statistic and quantile.
To apply the critical value approach, we compute the quantile of the χ^2 distribution:]{style="color:blue"} - Number of bins (k): `length(h$counts)` - Estimated parameters: 2 (mean and standard deviation) - Degrees of freedom: ```{r message=FALSE, warning=FALSE} df = k - 1 - 2 cat("Degrees of freedom =", df, "\n") ``` The critical value is obtained using: ``` {r message=FALSE, warning=FALSE} cat("Critical Chi-square value =", qchisq(alpha, df, lower.tail = FALSE),"\n") cat("Chi-square test statistic =", chi_res$statistic, "\n") ``` ## 1.2 Testing for other than Normal distribution (1 point) Assume that data could be described using Student's t distribution.
- apply one any goodness-of-fit test to to test whether sample X can be described using Student's t distribution.
- repeat the same steps following task 1.1.
- - p.s. add curve of Student's t distribution dt() instead of normal distribution. - Answer the questions below.
```{r,message=FALSE, warning=FALSE} # for Student's t you will need degrees of freedom (dof): params<-QRM::fit.st(data); dof<-params$par.ests[["nu"]] data <- X Z_t <- as.numeric(scale(data)) library(QRM) params<-QRM::fit.st(data) dof <-params$par.ests[["nu"]] # Plot histogram and t curve hist(Z_t, freq = FALSE, col = "skyblue", border = "white", main = "Histogram of Z with Student's t Curve", xlab = "Z = (X - mean)/sd") curve(dt(x, df = dof), add = TRUE, lwd = 2, col = "red") # Kolmogorov-Smirnov testas ks_res_t <- ks.test(Z_t, "pt", df = dof) ks_res_t ``` [What is the value of degrees of freedom ?
In the goodness-of-fit test for the Student’s t distribution I get that the degrees of freedom are `df=22718.6` and I used this value in the KS test and in the Student's t density curve.]{style="color:blue"}
[Formulate the hypothesis of selected test
- H0: The standardized sample Z=(X−inv(X))/s follows a Student’s t distribution with df=22718.6 degrees of freedom.
- Ha:The standardized sample Z does not follow a Student’s t distribution with df = 22718.6 degrees of freedom.
- Conclusion from testing: D = 0.012569, p-value = 0.7304, that means that we we fail to reject H0 and conclude that the Student’s t distribution with df = 22718.6 is consistent with the data
]{style="color:blue"} [Now choose degrees of freedom df=3. What results from the hypothesis you got?
KS test gives D = 0.055262 and p-value = 2.204e-08. Since the p-value is far below alpha = 0.05, we reject H0. For that reason Student’s t distribution with df = 3 does not describe the data well. This confirms that the t3 distribution is much closer to normal.]{style="color:blue"} # 2. Parametric hypothesis about normal distribution parameters We assume that X - our population, and we create other smaller samples (e.g. Y, Z) of size 100 observations randomly selected from X We assume that X represents our population, and we draw smaller random samples (e.g., Y and Z), each consisting of 100 observations from X. While sampling, different random generation seed should be set for Y and Z ```{r} set.seed(2005) Y <- X[sample(length(X), 100)] set.seed(1995) Z <- X[sample(length(X), 100)] ``` # 2.1 Parametric hypothesis about Y normal distribution parameter - mean $\mu$, when variance $\sigma^2$ is unknown (1 point) - Perform two.tailed test and one one.tailed test following the theory slides about Normal distribution parameter $\mu$ when $\sigma^2$ is not known for data sample $Y$.
-- assume that $\mu_0$ is the mean of $X$ (status quo). -- for this purpose implement the procedure by yourself and then use built-in function t.test
-- Also, you can follow the examples from
- Answer the questions below.
```{r} mu0<-mean(X) # hypothesized value # Sample stats n <- length(Y) ybar <- mean(Y) s <- sd(Y) # Test statistic (t-value) t_value <- (ybar - mu0) / (s / sqrt(n)) df <- n - 1 # p-values p_two <- 2 * (1 - pt(abs(t_value), df)) p_left <- pt(t_value, df) # Y mean < mu0 p_right <- 1 - pt(t_value, df) # Y mean > mu0 # Built-in function tt_two <- t.test(Y, mu = mu0, alternative = "two.sided") tt_right <- t.test(Y, mu = mu0, alternative = "greater") tt_left <- t.test(Y, mu = mu0, alternative = "less") cat("t-statistic =",t_value) cat("Two-tailed p-value =",p_two) cat("t.test result:\n") print(tt_two) ``` [Formulate hypothesis for two.tailed test.
- H0: μ = μ0 = 158.9811
- Ha: μ != μ0
- Conclusion from your procedure with explanation why: Using my implementation, I got that t = -0.76905 and a two-tailed p-value of 0.4437. Since p = 0.4437 > 0.05 I fail to reject the null hypothesis at the 5% level. This means that the observed difference between the sample mean of Y (130.7553) and μ0 = 158.9811 can be explained by random sampling variability and also there is no statstically significant evidence that the mean of Y differs from μ0.
- Conclusion from built-in function with explanation why: The built-in function t.test produced the same t-statistic and p-value so the conclusion is the same: we fail to reject H0 at alpha = 0.05. Therefore, the t.test output confirms that there isn't significant difference between the mean of Y and μ0.
]{style="color:blue"} [Formulate hypothesis for one one.tailed test.
- H0: μ >= μ0 = 158.9811
- Ha: μ < μ0
- Conclusion from your procedure with explanation why: For the left-tailed test, my manual calculation uses the same t-statistic t = -0.76905 and gives a one-tailed p-value of approx 0.222. Since p ~ 0.222 > 0.05, I fail to reject the null hypothesis at the 5% significance level. This means that, although the sample mean of Y (130.7553) is smaller than 158.9811, this difference is not statistically significant and can be attributed to random sampling variation.
- Conclusion from built-in function with explanation why: The built-in function t.test(Y, mu = 158.9811, alternative = "less") returns the same one-tailed p-value (~ 0.222), so it leads to the same decision: we fail to reject H0 at alpha = 0.05. Thus, according to the t.test result, there is not enough evidence to conclude that the true mean of Y is smaller than 158.9811.
]{style="color:blue"} # 2.2 Parametric hypothesis about the equality of two means from different samples $Y$ and $Z$, when variances are unknown (1 point) - Perform two.tailed test and one one.tailed test to compare means of two different samples $Y$ and $Z$ based on statistical data
- For this purpose it is recommended to use built-in function t.test and follow the theory. Carefully choose the parameters of used function.
- Answer the questions below.
```{r} mean_Y <- mean(Y); mean_Z <- mean(Z) sd_Y <- sd(Y); sd_Z <- sd(Z) cat("mean of Y =",mean_Y) cat("mean of Z =",mean_Z) cat("SD of Y =",sd_Y) cat("SD of Z =",sd_Z) # Two-sided test tt_yz_two <- t.test(Y, Z, alternative = "two.sided", var.equal = FALSE) cat("tt_yz_two =") print(tt_yz_two) # One-tailed tests tt_yz_greater <- t.test(Y, Z, alternative = "greater") # mean(Y) > mean(Z) cat("tt_yz_greater =") print(tt_yz_greater) tt_yz_less <- t.test(Y, Z, alternative = "less") # mean(Y) < mean(Z) cat("tt_yz_less =") print(tt_yz_less) ``` [Means of Y and Z:
- mean(Y) = 130.7553
- mean(Z) = 225.0230
Standard deviations of Y and Z:
- sd(Y) = 367.0212
- sd(Z) = 439.2312
]{style="color:blue"} [Formulate hypothesis for two.tailed test.
- H0: μY = μZ (the means of Y and Z are equal)
- Ha: μY ≠ μZ (the means of Y and Z are different)
- Conclusion: From the Welch two-sample t-test, we obtained t = −1.6469 with p-value = 0.1012. Since p = 0.1012 > 0.05, we fail to reject the null hypothesis at the 5% significance level. There is no statistically significant evidence that the mean of Y differs from the mean of Z.
]{style="color:blue"} [Formulate hypothesis for one one.tailed test.
We test whether the mean of Y is less than the mean of Z.
- H0: μY >= μZ (mean of Y is greater or equal to mean of Z)
- Ha: μY < μZ (mean of Y is smaller than mean of Z)
- Conclusion: The left-tailed t-test gives t = −1.6469 and p-value = 0.05061. Since p ~ 0.0506 is slightly above 0.05, we fail to reject H0 at the 5% level. This means that although the sample mean of Y is numerically smaller, we do not have strong enough statistical evidence to conclude that μY < μZ.
]{style="color:blue"} # 3. One-way ANOVA: task You are given two data samples in the separate files.
- 'Treatment.csv' file includes the response variable "Score" and additional variables (factors): HealthCondition, AgeGroup, Region, DrugType, and TreatmentDuration.
- 'Competition.csv' file includes the response variable "PerformanceTime" and additional variables (factors): EventType, Method, AgeGroup, Country, and Experience.
- The objective is to determine the existence of a statistically significant difference among the means of response variable conditioned by some factor. Method - ANOVA analysis. All steps and tasks are given below.
You can follow the theory slides of the examples here: or or # 3.1 Import data (0.5 point) - If the last digit of your student's vidko (ID) is 0-4, then import data file 'Treatment.csv'. The response variable "Score", factor "Xn", where n - is the last digit of your student's vidko (ID).
- If the last digit of your student's vidko (ID) is 5-9, then import data file 'Competition.csv'. The response variable "PerformanceTime", factor "Xn", where n - is the last digit of your student's vidko (ID).

- Answer the questions below.
```{r, warning=FALSE, message=FALSE} # Select response and factor Competition <- read.csv("Competition.csv", skip = 1) # Select response and factor (X5 = EventType) df <- data.frame( PerformanceTime = Competition$PerformanceTime, EventType = factor(Competition$EventType), Method = factor(Competition$Method), AgeGroup = factor(Competition$AgeGroup), Country = factor(Competition$Country), Experience = factor(Competition$Experience) ) # Levels of the chosen factor X5 = EventType levels(df$EventType) length(levels(df$EventType)) length(levels(df$Method)) length(levels(df$AgeGroup)) length(levels(df$Country)) length(levels(df$Experience)) ``` - How many factors you have in the sample? There are 5 factors in the Competition dataset (X5–X9): EventType, Method, AgeGroup, Country, and Experience. - How many levels each factor has? EventType has 4 levels. Method has 4 levels. AgeGroup has 5 levels. Country has 6 levels. Experience has 4 levels. # 3.2 Describe and visualise your data (0.5 point) - Compute the grand mean and standard deviation.
- Compute the mean and standard deviation for each factor level to compare the effect.
- Visualise your sample with the aim to show how the mean of response variable differs against levels of factor you have in the sample.
- Answer the questions below.
```{r, warning=FALSE, message=FALSE} # Grand mean and SD grand_mean <- mean(df$PerformanceTime) grand_sd <- sd(df$PerformanceTime) cat("Mean =",grand_mean) cat("SD =",grand_sd) # Mean and SD per level library(dplyr) level_stats <- df %>% group_by(EventType) %>% summarise( mean_PT = mean(PerformanceTime), sd_PT = sd(PerformanceTime), count = n() ) level_stats # Boxplot boxplot(PerformanceTime ~ EventType, data=df, main="Performance Time by Event Type", xlab="Event Type", ylab="Performance Time", col="lightgray") means <- tapply(df$PerformanceTime, df$EventType, mean) plot(means, type="b", pch=19, main="Mean Performance Time by Event Type", xlab="Event Type", ylab="Mean Performance Time", xaxt="n") axis(1, at=1:length(means), labels=names(means)) # Effect plot still works from the 'effects' package library(effects) model <- lm(PerformanceTime ~ EventType, data=df) plot(allEffects(model)) ``` [
- Are the means different? Yes. The means for PerformanceTime differ across the EventType levels.
- Are the difference among means is statistically significant? The numerical differences and the boxplots/mean plots indicate clear separation between the means]{style="color:blue"} # 3.3 Design one-way ANOVA experiment (1 point) [Formulate H0 and Ha. Determine a significance level, for example $\alpha=0.05$.
$H_0$: μ_Hurdles = μ_Marathon = μ_Relay = μ_Sprint
$H_a$: At least one group mean is different from the others.]{style="color:blue"} # 3.4 Check one-way ANOVA assumptions. (1 point) 1. Check the homogeneity of variance assumption (Levene's test).
2. Check the normality assumption (Shapiro-Wilk test).
3. Check for significant outliers (If extreme outliers exist, remove them).
```{r, warning=FALSE, message=FALSE} anova_model <- aov(PerformanceTime ~ EventType, data=df) # 1. Homogeneity of variance: Levene's test library(car) leveneTest(PerformanceTime ~ EventType, data=df) # 2. Normality of residuals: Shapiro-Wilk test resid_vals <- residuals(anova_model) shapiro.test(resid_vals) # 3. Check for outliers boxplot(df$PerformanceTime ~ df$EventType, main="Outliers by EventType") ``` [Give comments about each of test verified.
1. Homogeneity of variance (Levene’s test): Levene’s test returned F = 84.92 and p ≈ 2.55×10⁻⁵^2. Since p < 0.05, the variances among EventType groups are not equal. Therefore, the homogeneity assumption is violated.
2. Normality of residuals (Shapiro–Wilk test): The Shapiro–Wilk test gave W = 0.70035 with p < 2.2×10⁻¹⁶. Since p < 0.05, residuals deviate strongly from normality. The ANOVA normality assumption is violated.
3. Outliers: Boxplots and summary statistics indicate the presence of extreme outliers, especially in groups with large SD (e.g., Marathon). Therefore, the data contain significant outliers.
Then, make a final decision what procedure (parametric or non-parametric) will be applied for further analysis.
Student's comments: Both core ANOVA assumptions (normality and equal variances) are violated severely. Thus, parametric one-way ANOVA is not appropriate for this dataset. A non-parametric test, specifically the Kruskal–Wallis test, should be used instead.]{style="color:blue"} # 3.5 Compute one-way ANOVA for the given factor. (1 point) - Perform ANOVA for 'Xn' factor.
- Make a conclusion from ANOVA.
```{r, warning=FALSE, message=FALSE} # res.aov <- aov(response ~ factor, data = my_data) #F-test # summary(res.aov) # res.aov <-oneway.test(response ~ factor, data = my_data) # Welch one-way test could be used instead of classical approach with F test # kruskal.test(response ~ factor, data = my_data) # Kruskal-Wallis rank sum test could be used instead of classical approach with F test anova_model <- aov(PerformanceTime ~ EventType, data = df) summary(anova_model) welch <- oneway.test(PerformanceTime ~ EventType, data = df) welch kruskal.test(PerformanceTime ~ EventType, data = df) ``` [Formulate H0 and Ha. Determine a significance level $\alpha=0.05$.
$H_0$: μ_Hurdles = μ_Marathon = μ_Relay = μ_Sprint
$H_a$: At least one EventType group has a different mean PerformanceTime.
Student's comments:
At least one EventType group has a different mean PerformanceTime. performed classical one-way ANOVA using PerformanceTime as the response variable and EventType as the factor. The ANOVA F-test returned F = 9383 with p-value < 2e-16, which is far below α = 0.05. Therefore, I reject the null hypothesis. This means that the mean PerformanceTime is significantly different across EventType levels. The Welch ANOVA (unequal variances) and the Kruskal–Wallis test (non-parametric) also produced extremely small p-values (< 2.2e-16), confirming the same conclusion. Overall, the event type has a strong and statistically significant effect on PerformanceTime. ]{style="color:blue"} # 3.6 Multiple pairwise-comparison between the means of groups (1 point) Decide whether it makes sense to perform multiple pairwise-comparisons for all factors you have in the sample. If yes - perform multiple pairwise-comparisons and make final decision about the differences among groups. If not - just give comments why this step is not relevant. ```{r, warning=FALSE, message=FALSE} # TukeyHSD(res.aov) # pairwise.wilcox.test(response, factor, p.adjust.method = "BH") # Alternatively to Wilcox, Dunn test could be applied. Here, Reject Ho if p <= alpha/2 TukeyHSD(anova_model) pairwise.wilcox.test(df$PerformanceTime, df$EventType, p.adjust.method = "BH") ``` [Student's comments: since ANOVA showed strong evidence of differences among EventType groups, performing multiple comparisons is appropriate. Tukey HSD and Wilcoxon post-hoc tests show that all pairs of EventType groups differ significantly (all adjusted p-values < 0.05). Conclusion: every event type (Hurdles, Marathon, Relay, Sprint) has a different mean PerformanceTime compared to every other event type. ]{style="color:blue"}