---
title: "Labwork 3. Hypothesis Testing. Analysis of Variance"
author: "Kristupas Paulauskas, F5545"
output:
  html_document:
    df_print: paged
editor_options: 
  markdown: 
    wrap: 72
---

```{r setup, include=FALSE}
## other output files:  html_notebook for notebook ; html_document for html file; pdf_document for PDF file
```

# DESCRIPTION

Objective: To develop the ability to apply hypothesis testing methods
and analysis of variance (ANOVA) to evaluate statistical differences
between data groups and draw evidence-based conclusions.

# 1. Goodness-of-fit distributional testing

### TASK: Apply Goodness-of-fit tests to check whether the sample X follows some specific distribution.

-   Import data from Moodle, file Lab3_Dataset1.csv. <br/>
-   Your data set - the n column from imported file, where n - is
    the last digit of your student's vidko (ID).

```{r,message=FALSE, warning=FALSE}
# vidko (ID) =  F5545
# data import
DataSet1 <- read.csv("Lab3_DataSet1.csv", header = TRUE)
```

## 1.1 Testing for Normal distribution

(2 points)

-   Draw histogram of imported data sample and add the curve of normal
    distribution density dnorm(). <br/>

```{r message=FALSE, warning=FALSE}
X <- DataSet1[[5]]
X <- X[is.finite(X)]
m <- mean(X)
s <- sd(X)
hist(X, freq = FALSE, col = "skyblue", border = "white", main = "Histogram with Normal Curve", xlab = "X")
curve(dnorm(x, mean = m, sd = s), add = TRUE, lwd = 2, col = "red")
```

-   Determine mean and standard deviation. <br/>

```{r message=FALSE, warning=FALSE}
cat("Mean =", m, "\nSD =", s, "\n")
```

-   Set significance level alpha=0.05 <br/>

```{r}
alpha <- 0.05
```

-   Testing for normality: <br/> -- if vidko (ID) 1-5, then apply
    Anderson-Darling Test (goftest::ad.test), Shapiro-Wilk Test
    (shapiro.test) and Pearson's Chi-squared Test (chisq.test). <br/> --
    if vidko (ID) 6-0, then apply Kolmogorov-Smirnov Test (ks.test),
    Shapiro-Wilk Test (shapiro.test) and Pearson's Chi-squared Test
    (chisq.test) <br/>

```{r message=FALSE, warning=FALSE}
Z <- (X - m)/s
ad_res <- goftest::ad.test(Z, null="pnorm")
sw_res <- shapiro.test(X)
h <- hist(X, plot = FALSE)
breaks_cdf <- pnorm(h$breaks, mean=m, sd=s)
null_probs <- zoo::rollapply(breaks_cdf, 2, function(x) x[2]-x[1])
chi_res <- chisq.test(h$counts, p=null_probs, rescale.p=TRUE, simulate.p.value=TRUE)

ad_res
sw_res
chi_res
```

-   Answer the questions below. <br/>

```{r,message=FALSE, warning=FALSE}
# ad.test and ks.test requires data to be standardized

## Chi-squared test template (below)
# intervals <- hist(data) # hist determines bins and count the number of observations in the bins
# breaks_cdf <- pnorm(intervals$breaks, mean=mean(data), sd=sd(data))  # compute theoretical probabilities of normal distribution
# null.probs <- zoo::rollapply(breaks_cdf, 2, function(x) x[2]-x[1])
# chisq.test(intervals$counts, p=null.probs, rescale.p=TRUE, simulate.p.value = TRUE) # Chi2 test; simulate.p.value helps to achieve better approximation ; rescale.p = probabilities should sum up to 1
# qchisq(alpha,length(intervals$breaks)-2-1,lower.tail=FALSE) # critical value of chi2 distribution 
```

[Formulate the hypothesis for Anderson-Darling Test / Kolmogorov-Smirnov
Test. <br/>
- H0: The sample comes from a normal distribution. <br/> 
-Ha: The sample does not come from a normal distribution.
<br/> -Conclusion from testing: <br/>
If p-value \> 0.05 -> fail to reject H0 -\> the data do not contradict normality. <br/>If p-value <= 0.05 ->
reject H0 -\> the data are not normally distributed.
<br/>]{style="color:blue"} [Formulate the hypothesis for Shapiro-Wilk
Test. <br/> - H0: The data follow a normal distribution.<br/> - Ha: The
data do not follow a normal distribution. <br/> - Conclusion from
testing:<br/>If p-value \> 0.05 -\> fail to reject H0 -\> the data are
consistent with normality.<br/>If p-value ≤ 0.05 -\> reject H0 -\> the
data are not normally distributed. <br/>]{style="color:blue"}
[Formulate the the hypothesis for Pearson's Chi-squared Test. <br/> -
H0: The data follow a normal distribution with parameters μ and σ
estimated from the sample.<br/> 
- Ha: The data do not follow this normal
distribution.<br/> 
- How many bins you got?]{style="color:blue"}

```{r message=FALSE, warning=FALSE}
k <- length(h$counts)
k
```

[-   What is the meaning of pnorm()? <br/>is the cumulative distribution
    function (CDF) of the normal distribution. <br/> - Conclusion from
    testing: <br/>If p-value \> 0.05 -\> fail to reject H0 -\> the
    histogram frequencies match a normal distribution. <br/>If
    p-value ≤ 0.05 -\> reject H0 -\> the distribution differs from a
    normal distribution. <br/> - Find quantile of Chi-squared
    distribution qchisq() and make conclusion from the hypothesis
    comparing test statistic and quantile. <br/>To apply the critical value
    approach, we compute the quantile of the χ^2 distribution:]{style="color:blue"}

-   Number of bins (k): `length(h$counts)`

-   Estimated parameters: 2 (mean and standard deviation)

-   Degrees of freedom:
```{r message=FALSE, warning=FALSE}
df = k - 1 - 2
cat("Degrees of freedom =", df, "\n")
```

The critical value is obtained using:

``` {r message=FALSE, warning=FALSE}
cat("Critical Chi-square value =", qchisq(alpha, df, lower.tail = FALSE),"\n")
cat("Chi-square test statistic =", chi_res$statistic, "\n")
```

## 1.2 Testing for other than Normal distribution

(1 point)

Assume that data could be described using Student's t distribution.
<br/> - apply one any goodness-of-fit test to to test whether sample X
can be described using Student's t distribution. <br/> - repeat the same
steps following task 1.1. <br/> - - p.s. add curve of Student's t
distribution dt() instead of normal distribution. - Answer the questions
below. <br/>

```{r,message=FALSE, warning=FALSE}
# for Student's t you will need degrees of freedom (dof):  params<-QRM::fit.st(data); dof<-params$par.ests[["nu"]]
data <- X
Z_t <- as.numeric(scale(data))

library(QRM)
params<-QRM::fit.st(data)
dof <-params$par.ests[["nu"]]
# Plot histogram and t curve
hist(Z_t, freq = FALSE, col = "skyblue", border = "white",
     main = "Histogram of Z with Student's t Curve",
     xlab = "Z = (X - mean)/sd")

curve(dt(x, df = dof), add = TRUE, lwd = 2, col = "red")

# Kolmogorov-Smirnov testas
ks_res_t <- ks.test(Z_t, "pt", df = dof)
ks_res_t

```

[What is the value of degrees of freedom ? <br/>In the goodness-of-fit test for the Student’s t distribution I get that the degrees of freedom are
`df=22718.6`
and I used this value in the KS test and in the Student's t density curve.]{style="color:blue"}<br/>
[Formulate the hypothesis of selected test <br/> 
- H0: The standardized sample 
Z=(X−inv(X))/s follows a Student’s t distribution with 
df=22718.6 degrees of freedom. <br/> - Ha:The standardized sample 
Z does not follow a Student’s t distribution with df = 22718.6 degrees of freedom. <br/> - Conclusion from testing: D = 0.012569, p-value = 0.7304, that means that we we fail to reject H0 and conclude that the Student’s t distribution with df = 22718.6 is consistent with the data
<br/>]{style="color:blue"} [Now choose degrees of freedom df=3. What results from the hypothesis you
got? <br/> KS test gives D = 0.055262 and p-value = 2.204e-08.
Since the p-value is far below alpha = 0.05, we reject H0.
For that reason Student’s t distribution with df = 3 does not describe the data well.
This confirms that the t3 distribution is much closer to normal.]{style="color:blue"}

# 2. Parametric hypothesis about normal distribution parameters

We assume that X - our population, and we create other smaller samples
(e.g. Y, Z) of size 100 observations randomly selected from X

We assume that X represents our population, and we draw smaller random
samples (e.g., Y and Z), each consisting of 100 observations from X.
While sampling, different random generation seed should be set for Y and
Z

```{r}
set.seed(2005)
Y <- X[sample(length(X), 100)]

set.seed(1995)
Z <- X[sample(length(X), 100)] 
```

# 2.1 Parametric hypothesis about Y normal distribution parameter - mean $\mu$, when variance $\sigma^2$ is unknown

(1 point)

-   Perform two.tailed test and one one.tailed test following the theory
    slides about Normal distribution parameter $\mu$ when $\sigma^2$ is
    not known for data sample $Y$. <br/> -- assume that $\mu_0$ is the
    mean of $X$ (status quo). -- for this purpose implement the
    procedure by yourself and then use built-in function t.test <br/> --
    Also, you can follow the examples from
    <http://www.r-tutor.com/elementary-statistics/hypothesis-testing>
    <br/>
-   Answer the questions below. <br/>

```{r}
mu0<-mean(X) # hypothesized value 

# Sample stats
n  <- length(Y)
ybar <- mean(Y)
s <- sd(Y)

# Test statistic (t-value)
t_value <- (ybar - mu0) / (s / sqrt(n))
df <- n - 1

# p-values
p_two  <- 2 * (1 - pt(abs(t_value), df))
p_left <- pt(t_value, df)           # Y mean < mu0
p_right <- 1 - pt(t_value, df)      # Y mean > mu0

# Built-in function
tt_two   <- t.test(Y, mu = mu0, alternative = "two.sided")
tt_right <- t.test(Y, mu = mu0, alternative = "greater")
tt_left  <- t.test(Y, mu = mu0, alternative = "less")

cat("t-statistic =",t_value)
cat("Two-tailed p-value =",p_two)
cat("t.test result:\n")
print(tt_two)
```

[Formulate hypothesis for two.tailed test. <br/> 
- H0: μ = μ0 = 158.9811 <br/> 
- Ha: μ != μ0<br/> 
- Conclusion from your procedure with explanation why: Using my implementation, I got that t = -0.76905 and a two-tailed p-value of 0.4437. Since p = 0.4437 > 0.05 I fail to reject the null hypothesis at the 5% level. This means that the observed difference between the sample mean of Y (130.7553) and μ0 = 158.9811 can be explained by random sampling variability and also there is no statstically significant evidence that the mean of Y differs from μ0. <br/> 
- Conclusion from built-in function with explanation why: The built-in function t.test produced the same t-statistic and p-value so the conclusion is the same: we fail to reject H0 at alpha = 0.05. Therefore, the t.test output confirms that there isn't significant difference between the mean of Y and μ0. <br/>]{style="color:blue"} 

[Formulate hypothesis for one one.tailed test. <br/> 
- H0: μ >= μ0 = 158.9811 <br/> 
- Ha: μ < μ0<br/> 
- Conclusion from your procedure with explanation why: For the left-tailed test, my manual calculation uses the same t-statistic t = -0.76905 and gives a one-tailed p-value of approx 0.222. Since p ~ 0.222 > 0.05, I fail to reject the null hypothesis at the 5% significance level. This means that, although the sample mean of Y (130.7553) is smaller than 158.9811, this difference is not statistically significant and can be attributed to random sampling variation. <br/> 
- Conclusion from built-in function with explanation why: The built-in function t.test(Y, mu = 158.9811, alternative = "less") returns the same one-tailed p-value (~ 0.222), so it leads to the same decision: we fail to reject H0 at alpha = 0.05. Thus, according to the t.test result, there is not enough evidence to conclude that the true mean of Y is smaller than 158.9811. <br/>]{style="color:blue"} 


# 2.2 Parametric hypothesis about the equality of two means from different samples $Y$ and $Z$, when variances are unknown

(1 point)

-   Perform two.tailed test and one one.tailed test to compare means of
    two different samples $Y$ and $Z$ based on statistical data <br/>
-   For this purpose it is recommended to use built-in function t.test
    and follow the theory. Carefully choose the parameters of used
    function. <br/>
-   Answer the questions below. <br/>

```{r}
mean_Y <- mean(Y); mean_Z <- mean(Z)
sd_Y <- sd(Y); sd_Z <- sd(Z)

cat("mean of Y =",mean_Y)
cat("mean of Z =",mean_Z)
cat("SD of Y =",sd_Y)
cat("SD of Z =",sd_Z)

# Two-sided test
tt_yz_two <- t.test(Y, Z, alternative = "two.sided", var.equal = FALSE)
cat("tt_yz_two =")
print(tt_yz_two)

# One-tailed tests
tt_yz_greater <- t.test(Y, Z, alternative = "greater")  # mean(Y) > mean(Z)
cat("tt_yz_greater =")
print(tt_yz_greater)

tt_yz_less    <- t.test(Y, Z, alternative = "less")     # mean(Y) < mean(Z)
cat("tt_yz_less =")
print(tt_yz_less)
```

[Means of Y and Z: <br/> 
- mean(Y) = 130.7553 <br/> 
- mean(Z) = 225.0230 <br/> 
Standard deviations of Y and Z: <br/> 
- sd(Y) = 367.0212 <br/> 
- sd(Z) = 439.2312 <br/>]{style="color:blue"}
[Formulate hypothesis for two.tailed test. <br/> 
- H0: μY = μZ  (the means of Y and Z are equal) <br/> 
- Ha: μY ≠ μZ  (the means of Y and Z are different) <br/> 
- Conclusion: From the Welch two-sample t-test, we obtained t = −1.6469 with p-value = 0.1012. Since p = 0.1012 > 0.05, we fail to reject the null hypothesis at the 5% significance level. There is no statistically significant evidence that the mean of Y differs from the mean of Z. <br/>]{style="color:blue"}

[Formulate hypothesis for one one.tailed test. <br/> 
We test whether the mean of Y is less than the mean of Z. <br/> 
- H0: μY >= μZ  (mean of Y is greater or equal to mean of Z) <br/> 
- Ha: μY < μZ  (mean of Y is smaller than mean of Z) <br/> 
- Conclusion: The left-tailed t-test gives t = −1.6469 and p-value = 0.05061. Since p ~ 0.0506 is slightly above 0.05, we fail to reject H0 at the 5% level. This means that although the sample mean of Y is numerically smaller, we do not have strong enough statistical evidence to conclude that μY < μZ. <br/>]{style="color:blue"}


# 3. One-way ANOVA: task

You are given two data samples in the separate files. <br/> -
'Treatment.csv' file includes the response variable "Score" and
additional variables (factors): HealthCondition, AgeGroup, Region,
DrugType, and TreatmentDuration. <br/> - 'Competition.csv' file includes
the response variable "PerformanceTime" and additional variables
(factors): EventType, Method, AgeGroup, Country, and Experience. <br/>

-   The objective is to determine the existence of a statistically
    significant difference among the means of response variable
    conditioned by some factor. Method - ANOVA analysis. All steps
    and tasks are given below. <br/>

You can follow the theory slides of the examples here:
<https://www.datanovia.com/en/lessons/anova-in-r/> or
<http://www.sthda.com/english/wiki/one-way-anova-test-in-r#what-is-one-way-anova-test>
or
<http://www.sthda.com/english/wiki/kruskal-wallis-test-in-r#multiple-pairwise-comparison-between-groups>

# 3.1 Import data

(0.5 point)

-   If the last digit of your student's vidko (ID) is 0-4, then import
    data file 'Treatment.csv'. The response variable "Score", factor
    "Xn", where n - is the last digit of your student's vidko (ID).
    <br/>

-   If the last digit of your student's vidko (ID) is 5-9, then import
    data file 'Competition.csv'. The response variable
    "PerformanceTime", factor "Xn", where n - is the last digit of
    your student's vidko (ID). <br/><br/>

-   Answer the questions below. <br/>

```{r, warning=FALSE, message=FALSE}
# Select response and factor
Competition <- read.csv("Competition.csv", skip = 1)

# Select response and factor (X5 = EventType)
df <- data.frame(
  PerformanceTime = Competition$PerformanceTime,
  EventType = factor(Competition$EventType),
  
  Method = factor(Competition$Method),
  AgeGroup = factor(Competition$AgeGroup),
  Country = factor(Competition$Country),
  Experience = factor(Competition$Experience)
)

# Levels of the chosen factor X5 = EventType
levels(df$EventType)
length(levels(df$EventType))


length(levels(df$Method))
length(levels(df$AgeGroup))
length(levels(df$Country))
length(levels(df$Experience))
```

- How many factors you have in the sample?  
There are 5 factors in the Competition dataset (X5–X9): EventType, Method, AgeGroup, Country, and Experience.

- How many levels each factor has?  
EventType has 4 levels. 
Method has 4 levels.  
AgeGroup has 5 levels.  
Country has 6 levels.  
Experience has 4 levels.  

# 3.2 Describe and visualise your data

(0.5 point)

-   Compute the grand mean and standard deviation. <br/>
-   Compute the mean and standard deviation for each factor level to
    compare the effect. <br/>
-   Visualise your sample with the aim to show how the mean of response
    variable differs against levels of factor you have in the sample.
    <br/>
-   Answer the questions below. <br/>

```{r, warning=FALSE, message=FALSE}
# Grand mean and SD
grand_mean <- mean(df$PerformanceTime)
grand_sd   <- sd(df$PerformanceTime)

cat("Mean =",grand_mean)
cat("SD =",grand_sd)

# Mean and SD per level
library(dplyr)

level_stats <- df %>%
  group_by(EventType) %>%
  summarise(
    mean_PT = mean(PerformanceTime),
    sd_PT   = sd(PerformanceTime),
    count   = n()
  )

level_stats

# Boxplot
boxplot(PerformanceTime ~ EventType, data=df,
        main="Performance Time by Event Type",
        xlab="Event Type",
        ylab="Performance Time",
        col="lightgray")

means <- tapply(df$PerformanceTime, df$EventType, mean)
plot(means, type="b", pch=19,
     main="Mean Performance Time by Event Type",
     xlab="Event Type",
     ylab="Mean Performance Time",
     xaxt="n")
axis(1, at=1:length(means), labels=names(means))

# Effect plot still works from the 'effects' package
library(effects)
model <- lm(PerformanceTime ~ EventType, data=df)
plot(allEffects(model))
```

[<br/> - Are the means different? 
Yes. The means for PerformanceTime differ across the EventType levels.<br/> - Are the difference among means
is statistically significant? The numerical differences and the boxplots/mean plots indicate clear separation between the means]{style="color:blue"}

# 3.3 Design one-way ANOVA experiment

(1 point)

[Formulate H0 and Ha. Determine a significance level, for example
$\alpha=0.05$. <br/> $H_0$: μ_Hurdles = μ_Marathon = μ_Relay = μ_Sprint <br/> $H_a$: At least one group mean is different from the others.]{style="color:blue"}

# 3.4 Check one-way ANOVA assumptions.

(1 point)

1.  Check the homogeneity of variance assumption (Levene's test). <br/>
2.  Check the normality assumption (Shapiro-Wilk test). <br/>
3.  Check for significant outliers (If extreme outliers exist, remove
    them). <br/>

```{r, warning=FALSE, message=FALSE}
anova_model <- aov(PerformanceTime ~ EventType, data=df)

# 1. Homogeneity of variance: Levene's test
library(car)
leveneTest(PerformanceTime ~ EventType, data=df)

# 2. Normality of residuals: Shapiro-Wilk test
resid_vals <- residuals(anova_model)
shapiro.test(resid_vals)

# 3. Check for outliers
boxplot(df$PerformanceTime ~ df$EventType, main="Outliers by EventType")
```

[Give comments about each of test verified. <br/> 
1. Homogeneity of variance (Levene’s test):  
   Levene’s test returned F = 84.92 and p ≈ 2.55×10⁻⁵^2.  
   Since p < 0.05, the variances among EventType groups are not equal.  
   Therefore, the homogeneity assumption is violated.
 <br/> 2. Normality of residuals (Shapiro–Wilk test):  
   The Shapiro–Wilk test gave W = 0.70035 with p < 2.2×10⁻¹⁶.  
   Since p < 0.05, residuals deviate strongly from normality.  
   The ANOVA normality assumption is violated.
<br/> 3. Outliers:  
   Boxplots and summary statistics indicate the presence of extreme outliers,  
   especially in groups with large SD (e.g., Marathon).  
   Therefore, the data contain significant outliers. 
<br/> Then, make a final decision what procedure
(parametric or non-parametric) will be applied for further analysis.
<br/> Student's comments: Both core ANOVA assumptions (normality and equal variances) are violated severely.  
Thus, parametric one-way ANOVA is not appropriate for this dataset.  
A non-parametric test, specifically the Kruskal–Wallis test, should be used instead.]{style="color:blue"}

# 3.5 Compute one-way ANOVA for the given factor.

(1 point)

-   Perform ANOVA for 'Xn' factor. <br/>
-   Make a conclusion from ANOVA. <br/>

```{r, warning=FALSE, message=FALSE}
# res.aov <- aov(response ~ factor, data = my_data) #F-test
# summary(res.aov)
# res.aov <-oneway.test(response ~ factor, data = my_data)  # Welch one-way test could be used instead of classical approach with F test
# kruskal.test(response ~ factor, data = my_data) # Kruskal-Wallis rank sum test could be used instead of classical approach with F test
anova_model <- aov(PerformanceTime ~ EventType, data = df)
summary(anova_model)
welch <- oneway.test(PerformanceTime ~ EventType, data = df)
welch
kruskal.test(PerformanceTime ~ EventType, data = df)
```

[Formulate H0 and Ha. Determine a significance level $\alpha=0.05$.
<br/> 
$H_0$: μ_Hurdles = μ_Marathon = μ_Relay = μ_Sprint 
<br/> $H_a$: At least one EventType group has a different mean PerformanceTime. 
<br/> Student's comments:<br/>
At least one EventType group has a different mean PerformanceTime. performed classical one-way ANOVA using PerformanceTime as the response variable and EventType as the factor.  
The ANOVA F-test returned F = 9383 with p-value < 2e-16, which is far below α = 0.05.  
Therefore, I reject the null hypothesis.  
This means that the mean PerformanceTime is significantly different across EventType levels.  
The Welch ANOVA (unequal variances) and the Kruskal–Wallis test (non-parametric) also produced extremely small p-values (< 2.2e-16), confirming the same conclusion.  
Overall, the event type has a strong and statistically significant effect on PerformanceTime.  
]{style="color:blue"}

# 3.6 Multiple pairwise-comparison between the means of groups

(1 point)

Decide whether it makes sense to perform multiple pairwise-comparisons
for all factors you have in the sample. If yes - perform multiple
pairwise-comparisons and make final decision about the differences among
groups. If not - just give comments why this step is not relevant.

```{r, warning=FALSE, message=FALSE}
# TukeyHSD(res.aov)
# pairwise.wilcox.test(response, factor,  p.adjust.method = "BH")
# Alternatively to Wilcox, Dunn test could be applied. Here,  Reject Ho if p <= alpha/2
TukeyHSD(anova_model)
pairwise.wilcox.test(df$PerformanceTime, df$EventType,
                     p.adjust.method = "BH")
```

[Student's comments: since ANOVA showed strong evidence of differences among EventType groups, performing multiple comparisons is appropriate.
Tukey HSD and Wilcoxon post-hoc tests show that all pairs of EventType groups differ significantly (all adjusted p-values < 0.05).
Conclusion: every event type (Hurdles, Marathon, Relay, Sprint) has a different mean PerformanceTime compared to every other event type.
]{style="color:blue"}
