Analysis of quantitative factors in split-plot design

Oct 26, 2022 R

Analysis of Messy Data Vol.2, Milliken and Johnson, 2009. pg 440

In this experiment, researchers are interested in evaluating the effect of moisture and fertilizer on dry weight of wheat. There were 48 different peat pots and 12 plastic trays; four pots could be put into each tray. The moisture treatments consisted of adding 10, 20, 30, or 40 ml of water per pot per day to the tray where the water was absorbed by the peat pots. The levels of moisture were randomly assigned to the trays. The trays are the large size of experimental unit or whole plot, and the whole plot design is a one-way treatment structure in a completely randomized design structure.

The levels of fertilizer were 2, 4, 6, or 8 mg per pot. The four levels of fertilizer were randomly assigned to the four pots in each tray so that each fertilizer occurred once in each tray. The pot is the smallest size of experimental unit or split-plot or subplot, and the subplot design is a one-way treatment structure in a randomized complete block design structure where the 12 trays are the blocks.

Loading packages

library(pacman)

p_load(lmerTest,
       reshape2,
       emmeans,
       plotly)

options(digits = 5,contrasts=c("contr.sum","contr.poly"))

NOTE: There may be better/easier ways to perform the analysis/data wrangling. Modifications are encouraged.

First, we need to transform the data from wide to long format. That can be done with melt function, as shown below. After reshaping the dataset and renaming variables, we create quantitative and qualitative variables, so we can perform ANOVA and estimate the regression coefficients.

Loading data

load("d:/R programs/Split-plot quantitative factors/data.RData")
data$Tray=as.factor(data$Tray)
data<- data %>% melt(id=c("MST","Tray"))
names(data)[3:4]<-c("FERT","Y")
data$fert= as.numeric(data$FERT)*2
data$mst= data$MST
data$MST1=as.factor(data$MST)
str(data)

## 'data.frame':    48 obs. of  7 variables:
##  $ MST : int  10 10 10 20 20 20 30 30 30 40 ...
##  $ Tray: Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3 1 ...
##  $ FERT: Factor w/ 4 levels "Fert2","Fert4",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Y   : num  3.35 4.04 1.98 5.05 5.91 ...
##  $ fert: num  2 2 2 2 2 2 2 2 2 2 ...
##  $ mst : int  10 10 10 20 20 20 30 30 30 40 ...
##  $ MST1: Factor w/ 4 levels "10","20","30",..: 1 1 1 2 2 2 3 3 3 4 ...

Note that uppercase are factors and lowecase are numeric. Now we run anova using linear mixed model with lmer function from lme4 and lmerTest package.

Performing ANOVA in qualitative factors

lmer1<-lmer(Y~MST1*FERT+(1|Tray:MST),REML=TRUE,data=data)
anova(lmer1,ddf="Kenward-Roger")

## Type III Analysis of Variance Table with Kenward-Roger's method
##           Sum Sq Mean Sq NumDF DenDF F value  Pr(>F)    
## MST1        59.4    19.8     3     8   26.34 0.00017 ***
## FERT       297.1    99.0     3    24  131.65 4.9e-15 ***
## MST1:FERT   38.1     4.2     9    24    5.62 0.00034 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Main factors and interaction are statistically significant (P<0.05).

Visualizing the interaction

par(mfrow=c(1,2))
interaction.plot(data$MST1,data$FERT,data$Y)
interaction.plot(data$FERT,data$MST1,data$Y)

The graphs show the response to fertilizer for each moisture level and the response to moisture for each fertilizer level. It seems responses are quadratic for moisture and linear for fertilizer. We need to test that.

Variance components

print(VarCorr(lmer1),comp=c("Variance"),digits=4)

##  Groups   Name        Variance
##  Tray:MST (Intercept) 0.6636  
##  Residual             0.7521

Estimating the errors to test defined contrasts for linear, quadratic or cubic trends.

Contrasts

Fertilizer

There are four levels of fertilizer so the variance contrast to test linear trend at the same whole plot treatment (moisture) is:

\(\frac{\sigma^2_p}{3}[(-3)^2+(-1)^2+1^2+3^2] = \frac{20\sigma^2_p}{3}\)

and for quadratic trend is:

\[\frac{\sigma^2_p}{3}[(-1)^2+(-1)^2+1^2+1^2] = \frac{4\sigma^2_p}{3}\] and the cubic trend is:

\[\frac{\sigma^2_p}{3}[(-1)^2+(3)^2+(-3)^2+1^2] = \frac{20\sigma^2_p}{3}\]

The estimated standard errors are obtained by replacing \(\sigma^2\) with MSError(pot) in the square root of the respective variance (MSE=0.7521).

Moisture

Similarly, to test moisture (comparisons of whole plot treatments at the same subplot treatment) are: \[\frac{\sigma^2_{pot}+\sigma^2_{tray}}{3}[(-3)^2+(-1)^2+1^2+3^2] = \frac{20(\sigma^2_{pot}+\sigma^2_{tray})}{3}\]
\[\frac{\sigma^2_{pot}+\sigma^2_{tray}}{3}[(-1)^2+(-1)^2+1^2+1^2] = \frac{4(\sigma^2_{pot}+\sigma^2_{tray})}{3}\]
\[\frac{\sigma^2_{pot}+\sigma^2_{tray}}{3}[(-1)^2+(3)^2+(-3)^2+1^2] = \frac{20(\sigma^2_{pot}+\sigma^2_{tray})}{3}\]

for linear and quadratic, respectively. Variance components are inserted in the formula to obtain estimates of standard errors.

###Testing for significance of linear-quadratic-cubic trends.

emm1<-emmeans(lmer1,pairwise ~FERT|MST1)
contrast(emm1[[1]],"poly")

## MST1 = 10:
##  contrast  estimate   SE df t.ratio p.value
##  linear     10.0922 2.24 24   4.507  0.0001
##  quadratic  -0.0348 1.00 24  -0.035  0.9725
##  cubic      -0.2465 2.24 24  -0.110  0.9133
## 
## MST1 = 20:
##  contrast  estimate   SE df t.ratio p.value
##  linear     27.6660 2.24 24  12.355  <.0001
##  quadratic   1.6810 1.00 24   1.679  0.1062
##  cubic      -0.3504 2.24 24  -0.156  0.8770
## 
## MST1 = 30:
##  contrast  estimate   SE df t.ratio p.value
##  linear     29.0017 2.24 24  12.951  <.0001
##  quadratic   0.4346 1.00 24   0.434  0.6682
##  cubic       0.1908 2.24 24   0.085  0.9328
## 
## MST1 = 40:
##  contrast  estimate   SE df t.ratio p.value
##  linear     21.7646 2.24 24   9.720  <.0001
##  quadratic   1.0478 1.00 24   1.046  0.3058
##  cubic      -5.5881 2.24 24  -2.495  0.0198
## 
## Degrees-of-freedom method: kenward-roger

There is a significant linear response to fertilizer for each level of moisture and no significant quadratic trend.

The Satterthwaite method can be used to approximate degrees of freedom associated with \(\sigma^2_{pot}+\sigma^2_{tray}\). As seen below, df=19.29.

emm2<-emmeans(lmer1,pairwise ~MST1|FERT)
contrast(emm2[[1]],"poly")

## FERT = Fert2:
##  contrast  estimate   SE   df t.ratio p.value
##  linear       8.765 3.07 19.3   2.853  0.0101
##  quadratic   -3.768 1.37 19.3  -2.743  0.0128
##  cubic        0.442 3.07 19.3   0.144  0.8872
## 
## FERT = Fert4:
##  contrast  estimate   SE   df t.ratio p.value
##  linear       8.303 3.07 19.3   2.703  0.0140
##  quadratic   -6.833 1.37 19.3  -4.973  0.0001
##  cubic       -2.596 3.07 19.3  -0.845  0.4085
## 
## FERT = Fert6:
##  contrast  estimate   SE   df t.ratio p.value
##  linear      16.583 3.07 19.3   5.398  <.0001
##  quadratic   -7.612 1.37 19.3  -5.540  <.0001
##  cubic        0.261 3.07 19.3   0.085  0.9333
## 
## FERT = Fert8:
##  contrast  estimate   SE   df t.ratio p.value
##  linear      18.123 3.07 19.3   5.899  <.0001
##  quadratic  -11.779 1.37 19.3  -8.573  <.0001
##  cubic        2.045 3.07 19.3   0.666  0.5136
## 
## Degrees-of-freedom method: kenward-roger

#Note that in the book, the estimates for the linear contrasts on level of moisture are incorrect...

There are both linear and quadratic responses to moisture for each level of fertilizer.

Regression with split-plot errors

To obtain an appropriate analysis, one needs to use moisture as a continuous variable in the regression model and as a class variable in the random statement. “mst” is used to denote moisture as a continuous variable and “MST”” is used to denote the class variable. The random statement inserts the tray error term and imposes the correlation structure on the data.

Full model Y= \(\beta\)₀ + \(\beta_1~_{mst}\) + \(\beta_2~_{mst{^2}}\) + \(\beta_3~_{mst{^3}}\) + \(\beta_4~_{fert}\) + \(\beta_5~_{fert{^2}}\) + \(\beta_6~_{fert{^3}}\) + \(\beta_7~_{mst*fert}\) + \(\beta_8~_{mst{^2}*fert}\) + \(\beta_9~_{mst{^3}*fert}\) + \(\beta_{10}~_{mst*fert{^2}}\) + \(\beta_{11}~_{fert{^2}*mst{^2}}\) + \(\beta_{12}~_{fert{^2}*mst{^3}}\) + \(\beta_{13}~_{mst*fert{^3}}\) + \(\beta_{14}~_{mst{^2}*fert{^3}}\) + \(\beta_{15} ~_{fert{^3}*mst{^3}}\) + tray(MST) + e.

lmer2a<-lmer(Y~mst+I(mst*mst)+I(mst*MST*MST)+fert+I(fert*fert)+I(fert*fert*fert)+I(mst*fert)+
              I(mst*mst*fert)+I(mst*mst*mst*fert)+I(mst*fert*fert)+I(mst*mst*fert*fert)+
              I(mst*mst*mst*fert*fert)+I(mst*fert*fert*fert)+I(mst*mst*fert*fert*fert)+I(mst*mst*mst*fert*fert*fert)+
              (1|MST1:Tray), REML = TRUE,data=data)

summary(lmer2a)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Y ~ mst + I(mst * mst) + I(mst * MST * MST) + fert + I(fert *  
##     fert) + I(fert * fert * fert) + I(mst * fert) + I(mst * mst *  
##     fert) + I(mst * mst * mst * fert) + I(mst * fert * fert) +  
##     I(mst * mst * fert * fert) + I(mst * mst * mst * fert * fert) +  
##     I(mst * fert * fert * fert) + I(mst * mst * fert * fert *  
##     fert) + I(mst * mst * mst * fert * fert * fert) + (1 | MST1:Tray)
##    Data: data
## 
## REML criterion at convergence: 294.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4208 -0.5731 -0.0921  0.4582  2.1132 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  MST1:Tray (Intercept) 0.664    0.815   
##  Residual              0.752    0.867   
## Number of obs: 48, groups:  MST1:Tray, 12
## 
## Fixed effects:
##                                          Estimate Std. Error        df t value
## (Intercept)                             -2.14e+01   3.48e+01  2.49e+01   -0.61
## mst                                      3.97e+00   5.32e+00  2.49e+01    0.75
## I(mst * mst)                            -1.86e-01   2.35e-01  2.49e+01   -0.79
## I(mst * MST * MST)                       2.72e-03   3.12e-03  2.49e+01    0.87
## fert                                     1.48e+01   2.64e+01  2.41e+01    0.56
## I(fert * fert)                          -2.93e+00   5.84e+00  2.41e+01   -0.50
## I(fert * fert * fert)                    1.56e-01   3.88e-01  2.41e+01    0.40
## I(mst * fert)                           -2.58e+00   4.04e+00  2.41e+01   -0.64
## I(mst * mst * fert)                      1.33e-01   1.79e-01  2.41e+01    0.74
## I(mst * mst * mst * fert)               -2.05e-03   2.37e-03  2.41e+01   -0.87
## I(mst * fert * fert)                     5.26e-01   8.93e-01  2.41e+01    0.59
## I(mst * mst * fert * fert)              -2.67e-02   3.94e-02  2.41e+01   -0.68
## I(mst * mst * mst * fert * fert)         4.13e-04   5.24e-04  2.41e+01    0.79
## I(mst * fert * fert * fert)             -2.88e-02   5.93e-02  2.41e+01   -0.49
## I(mst * mst * fert * fert * fert)        1.52e-03   2.62e-03  2.41e+01    0.58
## I(mst * mst * mst * fert * fert * fert) -2.42e-05   3.48e-05  2.41e+01   -0.70
##                                         Pr(>|t|)
## (Intercept)                                 0.54
## mst                                         0.46
## I(mst * mst)                                0.44
## I(mst * MST * MST)                          0.39
## fert                                        0.58
## I(fert * fert)                              0.62
## I(fert * fert * fert)                       0.69
## I(mst * fert)                               0.53
## I(mst * mst * fert)                         0.46
## I(mst * mst * mst * fert)                   0.39
## I(mst * fert * fert)                        0.56
## I(mst * mst * fert * fert)                  0.50
## I(mst * mst * mst * fert * fert)            0.44
## I(mst * fert * fert * fert)                 0.63
## I(mst * mst * fert * fert * fert)           0.57
## I(mst * mst * mst * fert * fert * fert)     0.49
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling

Most of the significance levels are quite large, so several steps of a deletion process were carried out until all of the remaining variables had coefficients that were significantly different from zero.

Reduced model Y= \(\beta\)₀ + \(\beta_1~_{mst}\) + \(\beta_4~_{fert}\) + \(\beta_7~_{mst*fert}\) + \(\beta_8~_{mst{^2}*fert}\) + \(\beta_{10}~_{mst*fert{^2}}\) + tray(MST) + e.

It is of interest to determine if the reduced model adequately describes the data, so a lack of fit test is constructed.

Testing for lack of fit

lmer3<-lmer(Y~mst + fert + I(mst*fert) + I(mst*mst*fert)+ I(mst*fert*fert)+FERT:MST1+(1|Tray:MST1), REML = TRUE,data=data)
anova(lmer3,ddf="Kenward-Roger")

## Type III Analysis of Variance Table with Kenward-Roger's method
##                      Sum Sq Mean Sq NumDF DenDF F value Pr(>F)   
## mst                    0.06    0.06     1  27.3    0.08 0.7854   
## fert                   0.02    0.02     1  30.1    0.03 0.8616   
## I(mst * fert)          7.92    7.92     1  18.1   10.53 0.0045 **
## I(mst * mst * fert)    6.29    6.29     1  19.3    8.37 0.0092 **
## I(mst * fert * fert)   2.58    2.58     1  24.0    3.43 0.0765 . 
## FERT:MST1              6.83    0.68    10  24.7    0.89 0.5561   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The significance level corresponding to the test of lack of fi t is 0.5561, indicating the reduced model adequately describes the data.

Model Coefficients

lmer3a<-lmer(Y~mst + fert + I(mst*fert) + I(mst*mst*fert)+ I(mst*fert*fert)+(1|Tray:MST1), REML = TRUE,data=data)
summary(lmer3a)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Y ~ mst + fert + I(mst * fert) + I(mst * mst * fert) + I(mst *  
##     fert * fert) + (1 | Tray:MST1)
##    Data: data
## 
## REML criterion at convergence: 178.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5070 -0.5291  0.0653  0.4456  1.6692 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  Tray:MST1 (Intercept) 0.558    0.747   
##  Residual              0.757    0.870   
## Number of obs: 48, groups:  Tray:MST1, 12
## 
## Fixed effects:
##                       Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)           1.953558   0.920448 32.987398    2.12    0.041 *  
## mst                   0.075312   0.040690 40.622841    1.85    0.071 .  
## fert                 -1.079512   0.231841 37.847605   -4.66  3.9e-05 ***
## I(mst * fert)         0.172963   0.022470 36.511857    7.70  3.7e-09 ***
## I(mst * mst * fert)  -0.003463   0.000373 25.441178   -9.28  1.2e-09 ***
## I(mst * fert * fert)  0.001838   0.001147 32.437572    1.60    0.119    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) mst    fert   I(*fr) I(*m*f
## mst         -0.754                            
## fert        -0.444  0.335                     
## I(mst*fert)  0.153 -0.426 -0.789              
## I(ms*m*frt)  0.000  0.000  0.805 -0.830       
## I(ms*f*frt)  0.000  0.564  0.000 -0.510  0.000
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling

Transforming the factors back to numeric.

a<-data.frame(emmeans(lmer1,~FERT*MST1))[,1:3]
a$FERT=as.numeric(a$FERT)*2
a$MST1=as.numeric(a$MST1)*10

Using the model to predict surface with an expanded dataset.

pred_grid <- expand.grid(fert= seq(2,8, by=0.5), mst  = seq(10,40, by=1))
predicted=function(x,y){
  z=1.9535+x*0.07531-y*1.07951+x*y*0.17296-x*x*y*0.003463+x*y*y*0.0018379
}
pred_grid$predicted<-predicted(pred_grid$mst,pred_grid$fert)

plot_ly(x = ~pred_grid$mst, y = ~pred_grid$fert, z = ~pred_grid$predicted, type = "mesh3d") %>% 
  add_markers(x = ~a$MST1, y = ~a$FERT, z = ~a$emmean) %>%
  layout(scene = list(xaxis = list(title = 'Moisture'),
                     yaxis = list(title = 'Fertilizer'),
                     zaxis = list(title = 'Dry matter mean')))

The surface was fit with the reduced model indicated above. The regression model does a good job of describing the cell means (orange circles).

R Statistics

Vinicius Garnica

Research and Teaching Assistant

My research interests include agronomy and plant pathology matter.