---
title: "Labwork 3. Hypothesis Testing. Analysis of Variance"
author: "Kristupas Paulauskas, F5545"
output:
html_document:
df_print: paged
editor_options:
markdown:
wrap: 72
---
```{r setup, include=FALSE}
## other output files: html_notebook for notebook ; html_document for html file; pdf_document for PDF file
```
# DESCRIPTION
Objective: To develop the ability to apply hypothesis testing methods
and analysis of variance (ANOVA) to evaluate statistical differences
between data groups and draw evidence-based conclusions.
# 1. Goodness-of-fit distributional testing
### TASK: Apply Goodness-of-fit tests to check whether the sample X follows some specific distribution.
- Import data from Moodle, file Lab3_Dataset1.csv.
- Your data set - the n column from imported file, where n - is
the last digit of your student's vidko (ID).
```{r,message=FALSE, warning=FALSE}
# vidko (ID) = F5545
# data import
DataSet1 <- read.csv("Lab3_DataSet1.csv", header = TRUE)
```
## 1.1 Testing for Normal distribution
(2 points)
- Draw histogram of imported data sample and add the curve of normal
distribution density dnorm().
```{r message=FALSE, warning=FALSE}
X <- DataSet1[[5]]
X <- X[is.finite(X)]
m <- mean(X)
s <- sd(X)
hist(X, freq = FALSE, col = "skyblue", border = "white", main = "Histogram with Normal Curve", xlab = "X")
curve(dnorm(x, mean = m, sd = s), add = TRUE, lwd = 2, col = "red")
```
- Determine mean and standard deviation.
```{r message=FALSE, warning=FALSE}
cat("Mean =", m, "\nSD =", s, "\n")
```
- Set significance level alpha=0.05
```{r}
alpha <- 0.05
```
- Testing for normality:
-- if vidko (ID) 1-5, then apply
Anderson-Darling Test (goftest::ad.test), Shapiro-Wilk Test
(shapiro.test) and Pearson's Chi-squared Test (chisq.test).
--
if vidko (ID) 6-0, then apply Kolmogorov-Smirnov Test (ks.test),
Shapiro-Wilk Test (shapiro.test) and Pearson's Chi-squared Test
(chisq.test)
```{r message=FALSE, warning=FALSE}
Z <- (X - m)/s
ad_res <- goftest::ad.test(Z, null="pnorm")
sw_res <- shapiro.test(X)
h <- hist(X, plot = FALSE)
breaks_cdf <- pnorm(h$breaks, mean=m, sd=s)
null_probs <- zoo::rollapply(breaks_cdf, 2, function(x) x[2]-x[1])
chi_res <- chisq.test(h$counts, p=null_probs, rescale.p=TRUE, simulate.p.value=TRUE)
ad_res
sw_res
chi_res
```
- Answer the questions below.
```{r,message=FALSE, warning=FALSE}
# ad.test and ks.test requires data to be standardized
## Chi-squared test template (below)
# intervals <- hist(data) # hist determines bins and count the number of observations in the bins
# breaks_cdf <- pnorm(intervals$breaks, mean=mean(data), sd=sd(data)) # compute theoretical probabilities of normal distribution
# null.probs <- zoo::rollapply(breaks_cdf, 2, function(x) x[2]-x[1])
# chisq.test(intervals$counts, p=null.probs, rescale.p=TRUE, simulate.p.value = TRUE) # Chi2 test; simulate.p.value helps to achieve better approximation ; rescale.p = probabilities should sum up to 1
# qchisq(alpha,length(intervals$breaks)-2-1,lower.tail=FALSE) # critical value of chi2 distribution
```
[Formulate the hypothesis for Anderson-Darling Test / Kolmogorov-Smirnov
Test.
- H0: The sample comes from a normal distribution.
-Ha: The sample does not come from a normal distribution.
-Conclusion from testing:
If p-value \> 0.05 -> fail to reject H0 -\> the data do not contradict normality.
If p-value <= 0.05 ->
reject H0 -\> the data are not normally distributed.
]{style="color:blue"} [Formulate the hypothesis for Shapiro-Wilk
Test.
- H0: The data follow a normal distribution.
- Ha: The
data do not follow a normal distribution.
- Conclusion from
testing:
If p-value \> 0.05 -\> fail to reject H0 -\> the data are
consistent with normality.
If p-value ≤ 0.05 -\> reject H0 -\> the
data are not normally distributed.
]{style="color:blue"}
[Formulate the the hypothesis for Pearson's Chi-squared Test.
-
H0: The data follow a normal distribution with parameters μ and σ
estimated from the sample.
- Ha: The data do not follow this normal
distribution.
- How many bins you got?]{style="color:blue"}
```{r message=FALSE, warning=FALSE}
k <- length(h$counts)
k
```
[- What is the meaning of pnorm()?
is the cumulative distribution
function (CDF) of the normal distribution.
- Conclusion from
testing:
If p-value \> 0.05 -\> fail to reject H0 -\> the
histogram frequencies match a normal distribution.
If
p-value ≤ 0.05 -\> reject H0 -\> the distribution differs from a
normal distribution.
- Find quantile of Chi-squared
distribution qchisq() and make conclusion from the hypothesis
comparing test statistic and quantile.
To apply the critical value
approach, we compute the quantile of the χ^2 distribution:]{style="color:blue"}
- Number of bins (k): `length(h$counts)`
- Estimated parameters: 2 (mean and standard deviation)
- Degrees of freedom:
```{r message=FALSE, warning=FALSE}
df = k - 1 - 2
cat("Degrees of freedom =", df, "\n")
```
The critical value is obtained using:
``` {r message=FALSE, warning=FALSE}
cat("Critical Chi-square value =", qchisq(alpha, df, lower.tail = FALSE),"\n")
cat("Chi-square test statistic =", chi_res$statistic, "\n")
```
## 1.2 Testing for other than Normal distribution
(1 point)
Assume that data could be described using Student's t distribution.
- apply one any goodness-of-fit test to to test whether sample X
can be described using Student's t distribution.
- repeat the same
steps following task 1.1.
- - p.s. add curve of Student's t
distribution dt() instead of normal distribution. - Answer the questions
below.
```{r,message=FALSE, warning=FALSE}
# for Student's t you will need degrees of freedom (dof): params<-QRM::fit.st(data); dof<-params$par.ests[["nu"]]
data <- X
Z_t <- as.numeric(scale(data))
library(QRM)
params<-QRM::fit.st(data)
dof <-params$par.ests[["nu"]]
# Plot histogram and t curve
hist(Z_t, freq = FALSE, col = "skyblue", border = "white",
main = "Histogram of Z with Student's t Curve",
xlab = "Z = (X - mean)/sd")
curve(dt(x, df = dof), add = TRUE, lwd = 2, col = "red")
# Kolmogorov-Smirnov testas
ks_res_t <- ks.test(Z_t, "pt", df = dof)
ks_res_t
```
[What is the value of degrees of freedom ?
In the goodness-of-fit test for the Student’s t distribution I get that the degrees of freedom are
`df=22718.6`
and I used this value in the KS test and in the Student's t density curve.]{style="color:blue"}
[Formulate the hypothesis of selected test
- H0: The standardized sample
Z=(X−inv(X))/s follows a Student’s t distribution with
df=22718.6 degrees of freedom.
- Ha:The standardized sample
Z does not follow a Student’s t distribution with df = 22718.6 degrees of freedom.
- Conclusion from testing: D = 0.012569, p-value = 0.7304, that means that we we fail to reject H0 and conclude that the Student’s t distribution with df = 22718.6 is consistent with the data
]{style="color:blue"} [Now choose degrees of freedom df=3. What results from the hypothesis you
got?
KS test gives D = 0.055262 and p-value = 2.204e-08.
Since the p-value is far below alpha = 0.05, we reject H0.
For that reason Student’s t distribution with df = 3 does not describe the data well.
This confirms that the t3 distribution is much closer to normal.]{style="color:blue"}
# 2. Parametric hypothesis about normal distribution parameters
We assume that X - our population, and we create other smaller samples
(e.g. Y, Z) of size 100 observations randomly selected from X
We assume that X represents our population, and we draw smaller random
samples (e.g., Y and Z), each consisting of 100 observations from X.
While sampling, different random generation seed should be set for Y and
Z
```{r}
set.seed(2005)
Y <- X[sample(length(X), 100)]
set.seed(1995)
Z <- X[sample(length(X), 100)]
```
# 2.1 Parametric hypothesis about Y normal distribution parameter - mean $\mu$, when variance $\sigma^2$ is unknown
(1 point)
- Perform two.tailed test and one one.tailed test following the theory
slides about Normal distribution parameter $\mu$ when $\sigma^2$ is
not known for data sample $Y$.
-- assume that $\mu_0$ is the
mean of $X$ (status quo). -- for this purpose implement the
procedure by yourself and then use built-in function t.test
--
Also, you can follow the examples from
- Answer the questions below.
```{r}
mu0<-mean(X) # hypothesized value
# Sample stats
n <- length(Y)
ybar <- mean(Y)
s <- sd(Y)
# Test statistic (t-value)
t_value <- (ybar - mu0) / (s / sqrt(n))
df <- n - 1
# p-values
p_two <- 2 * (1 - pt(abs(t_value), df))
p_left <- pt(t_value, df) # Y mean < mu0
p_right <- 1 - pt(t_value, df) # Y mean > mu0
# Built-in function
tt_two <- t.test(Y, mu = mu0, alternative = "two.sided")
tt_right <- t.test(Y, mu = mu0, alternative = "greater")
tt_left <- t.test(Y, mu = mu0, alternative = "less")
cat("t-statistic =",t_value)
cat("Two-tailed p-value =",p_two)
cat("t.test result:\n")
print(tt_two)
```
[Formulate hypothesis for two.tailed test.
- H0: μ = μ0 = 158.9811
- Ha: μ != μ0
- Conclusion from your procedure with explanation why: Using my implementation, I got that t = -0.76905 and a two-tailed p-value of 0.4437. Since p = 0.4437 > 0.05 I fail to reject the null hypothesis at the 5% level. This means that the observed difference between the sample mean of Y (130.7553) and μ0 = 158.9811 can be explained by random sampling variability and also there is no statstically significant evidence that the mean of Y differs from μ0.
- Conclusion from built-in function with explanation why: The built-in function t.test produced the same t-statistic and p-value so the conclusion is the same: we fail to reject H0 at alpha = 0.05. Therefore, the t.test output confirms that there isn't significant difference between the mean of Y and μ0.
]{style="color:blue"}
[Formulate hypothesis for one one.tailed test.
- H0: μ >= μ0 = 158.9811
- Ha: μ < μ0
- Conclusion from your procedure with explanation why: For the left-tailed test, my manual calculation uses the same t-statistic t = -0.76905 and gives a one-tailed p-value of approx 0.222. Since p ~ 0.222 > 0.05, I fail to reject the null hypothesis at the 5% significance level. This means that, although the sample mean of Y (130.7553) is smaller than 158.9811, this difference is not statistically significant and can be attributed to random sampling variation.
- Conclusion from built-in function with explanation why: The built-in function t.test(Y, mu = 158.9811, alternative = "less") returns the same one-tailed p-value (~ 0.222), so it leads to the same decision: we fail to reject H0 at alpha = 0.05. Thus, according to the t.test result, there is not enough evidence to conclude that the true mean of Y is smaller than 158.9811.
]{style="color:blue"}
# 2.2 Parametric hypothesis about the equality of two means from different samples $Y$ and $Z$, when variances are unknown
(1 point)
- Perform two.tailed test and one one.tailed test to compare means of
two different samples $Y$ and $Z$ based on statistical data
- For this purpose it is recommended to use built-in function t.test
and follow the theory. Carefully choose the parameters of used
function.
- Answer the questions below.
```{r}
mean_Y <- mean(Y); mean_Z <- mean(Z)
sd_Y <- sd(Y); sd_Z <- sd(Z)
cat("mean of Y =",mean_Y)
cat("mean of Z =",mean_Z)
cat("SD of Y =",sd_Y)
cat("SD of Z =",sd_Z)
# Two-sided test
tt_yz_two <- t.test(Y, Z, alternative = "two.sided", var.equal = FALSE)
cat("tt_yz_two =")
print(tt_yz_two)
# One-tailed tests
tt_yz_greater <- t.test(Y, Z, alternative = "greater") # mean(Y) > mean(Z)
cat("tt_yz_greater =")
print(tt_yz_greater)
tt_yz_less <- t.test(Y, Z, alternative = "less") # mean(Y) < mean(Z)
cat("tt_yz_less =")
print(tt_yz_less)
```
[Means of Y and Z:
- mean(Y) = 130.7553
- mean(Z) = 225.0230
Standard deviations of Y and Z:
- sd(Y) = 367.0212
- sd(Z) = 439.2312
]{style="color:blue"}
[Formulate hypothesis for two.tailed test.
- H0: μY = μZ (the means of Y and Z are equal)
- Ha: μY ≠ μZ (the means of Y and Z are different)
- Conclusion: From the Welch two-sample t-test, we obtained t = −1.6469 with p-value = 0.1012. Since p = 0.1012 > 0.05, we fail to reject the null hypothesis at the 5% significance level. There is no statistically significant evidence that the mean of Y differs from the mean of Z.
]{style="color:blue"}
[Formulate hypothesis for one one.tailed test.
We test whether the mean of Y is less than the mean of Z.
- H0: μY >= μZ (mean of Y is greater or equal to mean of Z)
- Ha: μY < μZ (mean of Y is smaller than mean of Z)
- Conclusion: The left-tailed t-test gives t = −1.6469 and p-value = 0.05061. Since p ~ 0.0506 is slightly above 0.05, we fail to reject H0 at the 5% level. This means that although the sample mean of Y is numerically smaller, we do not have strong enough statistical evidence to conclude that μY < μZ.
]{style="color:blue"}
# 3. One-way ANOVA: task
You are given two data samples in the separate files.
-
'Treatment.csv' file includes the response variable "Score" and
additional variables (factors): HealthCondition, AgeGroup, Region,
DrugType, and TreatmentDuration.
- 'Competition.csv' file includes
the response variable "PerformanceTime" and additional variables
(factors): EventType, Method, AgeGroup, Country, and Experience.
- The objective is to determine the existence of a statistically
significant difference among the means of response variable
conditioned by some factor. Method - ANOVA analysis. All steps
and tasks are given below.
You can follow the theory slides of the examples here:
or
or
# 3.1 Import data
(0.5 point)
- If the last digit of your student's vidko (ID) is 0-4, then import
data file 'Treatment.csv'. The response variable "Score", factor
"Xn", where n - is the last digit of your student's vidko (ID).
- If the last digit of your student's vidko (ID) is 5-9, then import
data file 'Competition.csv'. The response variable
"PerformanceTime", factor "Xn", where n - is the last digit of
your student's vidko (ID).
- Answer the questions below.
```{r, warning=FALSE, message=FALSE}
# Select response and factor
Competition <- read.csv("Competition.csv", skip = 1)
# Select response and factor (X5 = EventType)
df <- data.frame(
PerformanceTime = Competition$PerformanceTime,
EventType = factor(Competition$EventType),
Method = factor(Competition$Method),
AgeGroup = factor(Competition$AgeGroup),
Country = factor(Competition$Country),
Experience = factor(Competition$Experience)
)
# Levels of the chosen factor X5 = EventType
levels(df$EventType)
length(levels(df$EventType))
length(levels(df$Method))
length(levels(df$AgeGroup))
length(levels(df$Country))
length(levels(df$Experience))
```
- How many factors you have in the sample?
There are 5 factors in the Competition dataset (X5–X9): EventType, Method, AgeGroup, Country, and Experience.
- How many levels each factor has?
EventType has 4 levels.
Method has 4 levels.
AgeGroup has 5 levels.
Country has 6 levels.
Experience has 4 levels.
# 3.2 Describe and visualise your data
(0.5 point)
- Compute the grand mean and standard deviation.
- Compute the mean and standard deviation for each factor level to
compare the effect.
- Visualise your sample with the aim to show how the mean of response
variable differs against levels of factor you have in the sample.
- Answer the questions below.
```{r, warning=FALSE, message=FALSE}
# Grand mean and SD
grand_mean <- mean(df$PerformanceTime)
grand_sd <- sd(df$PerformanceTime)
cat("Mean =",grand_mean)
cat("SD =",grand_sd)
# Mean and SD per level
library(dplyr)
level_stats <- df %>%
group_by(EventType) %>%
summarise(
mean_PT = mean(PerformanceTime),
sd_PT = sd(PerformanceTime),
count = n()
)
level_stats
# Boxplot
boxplot(PerformanceTime ~ EventType, data=df,
main="Performance Time by Event Type",
xlab="Event Type",
ylab="Performance Time",
col="lightgray")
means <- tapply(df$PerformanceTime, df$EventType, mean)
plot(means, type="b", pch=19,
main="Mean Performance Time by Event Type",
xlab="Event Type",
ylab="Mean Performance Time",
xaxt="n")
axis(1, at=1:length(means), labels=names(means))
# Effect plot still works from the 'effects' package
library(effects)
model <- lm(PerformanceTime ~ EventType, data=df)
plot(allEffects(model))
```
[
- Are the means different?
Yes. The means for PerformanceTime differ across the EventType levels.
- Are the difference among means
is statistically significant? The numerical differences and the boxplots/mean plots indicate clear separation between the means]{style="color:blue"}
# 3.3 Design one-way ANOVA experiment
(1 point)
[Formulate H0 and Ha. Determine a significance level, for example
$\alpha=0.05$.
$H_0$: μ_Hurdles = μ_Marathon = μ_Relay = μ_Sprint
$H_a$: At least one group mean is different from the others.]{style="color:blue"}
# 3.4 Check one-way ANOVA assumptions.
(1 point)
1. Check the homogeneity of variance assumption (Levene's test).
2. Check the normality assumption (Shapiro-Wilk test).
3. Check for significant outliers (If extreme outliers exist, remove
them).
```{r, warning=FALSE, message=FALSE}
anova_model <- aov(PerformanceTime ~ EventType, data=df)
# 1. Homogeneity of variance: Levene's test
library(car)
leveneTest(PerformanceTime ~ EventType, data=df)
# 2. Normality of residuals: Shapiro-Wilk test
resid_vals <- residuals(anova_model)
shapiro.test(resid_vals)
# 3. Check for outliers
boxplot(df$PerformanceTime ~ df$EventType, main="Outliers by EventType")
```
[Give comments about each of test verified.
1. Homogeneity of variance (Levene’s test):
Levene’s test returned F = 84.92 and p ≈ 2.55×10⁻⁵^2.
Since p < 0.05, the variances among EventType groups are not equal.
Therefore, the homogeneity assumption is violated.
2. Normality of residuals (Shapiro–Wilk test):
The Shapiro–Wilk test gave W = 0.70035 with p < 2.2×10⁻¹⁶.
Since p < 0.05, residuals deviate strongly from normality.
The ANOVA normality assumption is violated.
3. Outliers:
Boxplots and summary statistics indicate the presence of extreme outliers,
especially in groups with large SD (e.g., Marathon).
Therefore, the data contain significant outliers.
Then, make a final decision what procedure
(parametric or non-parametric) will be applied for further analysis.
Student's comments: Both core ANOVA assumptions (normality and equal variances) are violated severely.
Thus, parametric one-way ANOVA is not appropriate for this dataset.
A non-parametric test, specifically the Kruskal–Wallis test, should be used instead.]{style="color:blue"}
# 3.5 Compute one-way ANOVA for the given factor.
(1 point)
- Perform ANOVA for 'Xn' factor.
- Make a conclusion from ANOVA.
```{r, warning=FALSE, message=FALSE}
# res.aov <- aov(response ~ factor, data = my_data) #F-test
# summary(res.aov)
# res.aov <-oneway.test(response ~ factor, data = my_data) # Welch one-way test could be used instead of classical approach with F test
# kruskal.test(response ~ factor, data = my_data) # Kruskal-Wallis rank sum test could be used instead of classical approach with F test
anova_model <- aov(PerformanceTime ~ EventType, data = df)
summary(anova_model)
welch <- oneway.test(PerformanceTime ~ EventType, data = df)
welch
kruskal.test(PerformanceTime ~ EventType, data = df)
```
[Formulate H0 and Ha. Determine a significance level $\alpha=0.05$.
$H_0$: μ_Hurdles = μ_Marathon = μ_Relay = μ_Sprint
$H_a$: At least one EventType group has a different mean PerformanceTime.
Student's comments:
At least one EventType group has a different mean PerformanceTime. performed classical one-way ANOVA using PerformanceTime as the response variable and EventType as the factor.
The ANOVA F-test returned F = 9383 with p-value < 2e-16, which is far below α = 0.05.
Therefore, I reject the null hypothesis.
This means that the mean PerformanceTime is significantly different across EventType levels.
The Welch ANOVA (unequal variances) and the Kruskal–Wallis test (non-parametric) also produced extremely small p-values (< 2.2e-16), confirming the same conclusion.
Overall, the event type has a strong and statistically significant effect on PerformanceTime.
]{style="color:blue"}
# 3.6 Multiple pairwise-comparison between the means of groups
(1 point)
Decide whether it makes sense to perform multiple pairwise-comparisons
for all factors you have in the sample. If yes - perform multiple
pairwise-comparisons and make final decision about the differences among
groups. If not - just give comments why this step is not relevant.
```{r, warning=FALSE, message=FALSE}
# TukeyHSD(res.aov)
# pairwise.wilcox.test(response, factor, p.adjust.method = "BH")
# Alternatively to Wilcox, Dunn test could be applied. Here, Reject Ho if p <= alpha/2
TukeyHSD(anova_model)
pairwise.wilcox.test(df$PerformanceTime, df$EventType,
p.adjust.method = "BH")
```
[Student's comments: since ANOVA showed strong evidence of differences among EventType groups, performing multiple comparisons is appropriate.
Tukey HSD and Wilcoxon post-hoc tests show that all pairs of EventType groups differ significantly (all adjusted p-values < 0.05).
Conclusion: every event type (Hurdles, Marathon, Relay, Sprint) has a different mean PerformanceTime compared to every other event type.
]{style="color:blue"}