R code: Equivalent of gsub for a vector of patterns

December 18, 2013 by Aurélien

I need to change some words in a vector, but I do not want to use gsub as I have a vector of ‘patterns’ to change which would require a loop. I have come up with the piece of code below.

The first step is to create a test dataset:

test <- c("War", "Freedom", "Ignorance")
test <- cbind(rep(test, 2), rep("is"), rep(test, 2))

test
#      [,1]        [,2] [,3]
# [1,] "War"       "is" "War"
# [2,] "Freedom"   "is" "Freedom"
# [3,] "Ignorance" "is" "Ignorance"
# [4,] "War"       "is" "War"
# [5,] "Freedom"   "is" "Freedom"
# [6,] "Ignorance" "is" "Ignorance"

A matrix with the different strings to look for and the replacement text is created.

str12 <- data.frame(
str1 = c("War", "Freedom", "Ignorance"),
str2 = c("Peace", "Slavery", "Strength"))

The function is applied to the test dataset.

test[,3] <- sapply(test[, 3], function(x){
as.character(str12$str2[match(x, str12$str1)])
})

The result is:

test
     [,1]        [,2] [,3]      
[1,] "War"       "is" "Peace"   
[2,] "Freedom"   "is" "Slavery" 
[3,] "Ignorance" "is" "Strength"
[4,] "War"       "is" "Peace"   
[5,] "Freedom"   "is" "Slavery" 
[6,] "Ignorance" "is" "Strength"

Posted in Uncategorized | Tagged gsub, R code | 1 Comment »

Milk recording

February 4, 2013 by Aurélien

An extremely important source of data for epidemiological studies in dairy cattle is milk recording. Outside of Europe, milk recording is sometimes called dairy herd improvement (DHI). In France, it is called contrôle laitier. International guidelines on how to perform milk recording are published by the International Committee for Animal Recording (ICAR).

The basic principles are the following:

Recording is performed on all the lactating cows of a herd on a regular basis. The reference method consists in 4 week intervals with at least 11 recordings per herd per year.
Two consecutive milkings are used per recording
Milk yield is measured on all lactating cows
A milk sample is taken from each cow for the determination of butterfat content, protein content and somatic cell count.

The data are used for different purposes. They allow farmers to know how much milk each cow produces as well as the content of the milk. Milk somatic cell count is a marker of inflammation and is used to detect mastitis. Milk recording is also used for the evaluation of the genetic value of bulls and cows.

For us epidemiologists, it is an invaluable source of information. It can be used to assess the effect of diseases on milk production (e.g. mastitis, bluetongue), to describe how variable herds are at a national level (e.g. mastitis).

Posted in Uncategorized | Tagged Milk Recording | Leave a Comment »

R code: Drawing an arrow using origin, length and angle

July 12, 2012 by Aurélien

The R arrows() function draws an arrow taking the coordinates of the origin and end point of the arrow as arguments. I need to draw an arrow using the coordinates of the origin, the length of the arrow and an angle. To do that, I have written the following function:

arrows.1 <- function(x0, y0, length.ar, angle.ar, ...){

ab <- cos(angle.ar) * length.ar
bc <- sign(sin(angle.ar)) * sqrt(length.ar^2 - ab^2)

x1 <- x0 + ab
y1 <- y0 + bc

arrows(x0, y0, x1, y1, ...)
}

The function is tested on a circle:

## Circle function
circle <- function(xorig, yorig, radius, add, ...){

x <- seq(-radius, radius, length.out = 1000)

y <- sapply(x, function(z) sqrt(radius^2 - z^2))

if(add == TRUE){

lines(xorig + c(x, rev(x)), c(yorig + y, yorig + rev(-y)), type = "l", ...)

} else {

plot(xorig + c(x, rev(x)), c(yorig + y, yorig + rev(-y)), type = "l", ...)

}

}

## Test
circle(1, 1, 1, add = FALSE)
for(i in seq(0, 2 * pi, length.out = 9)){
arrows.1(x0 = 1, y0 = 1, length.ar = 1, angle.ar = i)
}

Posted in Uncategorized | Tagged R code | Leave a Comment »

R code: How to count the number of occurrences of a substring within a string

May 24, 2012 by Aurélien

The idea is to count the number of times a pattern appears within a character string. This can be achieved by combining grep and strsplit. Here is the function:

strcount <- function(x, pattern, split){

unlist(lapply(
    strsplit(x, split),
       function(z) na.omit(length(grep(pattern, z)))
   ))

}

Here is an example:

txt <- "The message is that there are no knowns. There are things we know that we know. There are known unknowns. That is to say there are things that we now know we don't know. But there are also unknown unknowns. There are things we do not know we don't know. So when we do the best we can and we pull all this information together, and we then say well that's basically what we see as the situation, that is really only the known knowns and the known unknowns. And each year, we discover a few more of those unknown unknowns."

strcount(tolower(txt), "know", " ")
#[1] 17

strcount(tolower(txt), "^known", " ")
#[1] 5

strcount(tolower(txt), "unknown", " ")
#[1] 6

Posted in Uncategorized | Tagged R code, Regular expression | 6 Comments »

R code: How to convert a list to a data.frame

May 22, 2012 by Aurélien

This a problem I have had a few times where for example tapply returns a list and I need to convert it to a data.frame.

myList <- list(
            A = 1:3,
            B = 1:5)
myList
# $A
# [1] 1 2 3
#
# $B
# [1] 1 2 3 4 5

A simple solution to this is:

data.frame(
         lNames = rep(names(myList), lapply(myList, length)),
         lVal = unlist(myList))

#      lNames lVal
# A1      A    1
# A2      A    2
# A3      A    3
# B1      B    1
# B2      B    2
# B3      B    3
# B4      B    4
# B5      B    5

Posted in Uncategorized | Tagged R code | Leave a Comment »

Risk factors for a high somatic cell count at the first milk recording in a large sample of UK dairy herds

May 8, 2012 by Aurélien

On this page is the R code to reproduce the results from the individual cow model reported in the article: Risk factors for a high somatic cell count at the first milk recording in a large sample of UK dairy herds.

I have put a file with all the functions online. The detail is in the sections below, but it is probably simpler to use source() on the online version:

source("https://aurelienmadouasse.files.wordpress.com/2012/05/dryperiodsim.doc")

Simulations are performed by the Simul() functions. Here is how it works:

Simul(DofParity = 1, # Parity at drying off
      DofMilk = 15, # Milk yield at drying off (kg)
      CalvDim = 15, # Days in milk on the fist milk recording
      CalvDay = 200, # Day of year at calving (1 to 365)
      DofHerdPrev = .4, # Herd prevalence of SCC > 200 on last milk recording before drying off
      pStyh = .27, # Herd probability of staying > 200 during the dry period (1 - cure)
      pNinh = .17, # Herd probability of becoming > 200 during the dry period
      nRep = 100 # Number of simulations
)

All the variables are held at some average or plausible value except for the one which is looked at. For example, milk yield at the last recording before calving was the variable that had the strongest association with the probability of having a high SCC on the first milk recording following calving. We predict this probability for a range of milk yields. Here is how this is done:

rm(Sim)

for(i in seq(5, 30, by = 5)){

Sim <- Simul(DofParity = 1,
               DofMilk = i,
               CalvDim = 15,
               CalvDay = 200,
               DofHerdPrev = .4,
               pStyh = .27,
               pNinh = .17,
               nRep = 100)

}

plot(Cur50 ~ Milk, data = Sim,
     pch = 20, col = "blue",
     xlab = "Milk yield at drying off",
     ylim = c(0, 1.1), yaxt = "n",
     ylab = "Proportion of eligible cows",
     frame.plot = FALSE)
points(Ninf50 ~ Milk, data = Sim,
       pch = 20, col = "red")
with(Sim,
     segments(Milk, Cur2.5, Milk, Cur97.5, col = "blue"))
with(Sim,
     segments(Milk, Ninf2.5, Milk, Ninf97.5, col = "red"))
axis(side = 2, at = seq(0, 1, by = .1), las = 2)
legend("top",
       c("Cure rate", "New infection rate"),
       col = c("blue", "red"), pch = 20,
       bty = "n", horiz = TRUE)

The dots on the graph represent the median and the bars the range in which 95% of the simulations are.

Functions

## Inverse logit
invlogit
exp(x) / (1 + exp(x))

}

## Legendre polynomials
poly.legendre <- function(x, degree, minx, maxx){

if (min(x) < minx) stop("min(x) < minx")
if (max(x) > maxx) stop("max(x) > maxx")
if (degree > 4) stop("Degree must be < 5")

lg <- length(x)

z <- data.frame(x)

z$xP1 <- 2 * (x - minx) / (maxx - minx) - 1
z$xP2 <- (3 * z$xP1 * z$xP1 - 1) / 2
if(degree > 2) z$xP3 <- (5 * z$xP2 * z$xP1 - 2 * z$xP1) / 3
if(degree > 3) z$xP4 <- (7 * z$xP3 * z$xP1 - 3 * z$xP2) / 4

z  <- z[, -1]
}

See here for more on Legendre polynomials.

Model coefficients

res <- structure(list(
node = structure(c(1L, 12L, 14L, 15L, 16L, 17L, 18L, 19L, 20L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L,
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L),
.Label = c("beta[1]", "beta[10]", "beta[11]", "beta[12]",
"beta[13]", "beta[14]", "beta[15]", "beta[16]",
"beta[17]", "beta[18]", "beta[19]", "beta[2]",
"beta[20]", "beta[3]", "beta[4]", "beta[5]",
"beta[6]", "beta[7]", "beta[8]", "beta[9]",
"mu[1]", "mu[2]", "rho", "sigma[1]", "sigma[2]",
"theta[1]", "theta[2]", "theta[3]", "theta[4]"),
class = "factor"),
mean = c(0.6474, 0.5376, -0.131, -0.2592, 0.2117, -0.09114,
-0.006014, 0.2137, 0.05254, -0.03646, 0.4143, 0.5464,
-0.1702, -0.3673, 0.2021, -0.163, -0.01338, 0.1949,
0.08277, -0.01062, -1.218, -1.032, 0.8246, 0.3137,
0.3546, 0.6099, 1.418, -0.7655, 1.621),
sd = c(0.03272, 0.02534, 0.02431, 0.0162, 0.02056, 0.0233,
0.01313, 0.01323, 0.01307, 0.01299, 0.03514, 0.03781,
0.02797, 0.01676, 0.02171, 0.02383, 0.01352, 0.01345,
0.01325, 0.01316, 0.04424, 0.06795, 0.04498, 0.01393,
0.0142, 0.08169, 0.09263, 0.0886, 0.1365)),
.Names = c("node", "mean", "sd"),
class = "data.frame", row.names = c(NA, -29L))

Simulation of covariates effects on the probability of having an SCC > 200,000 cells/mL on the first milk recording following calving

## row numbers to be used
rw.mu      <- grep("mu", res$node)
rw.beta    <- grep("beta", res$node)
rw.theta   <- grep("theta", res$node)
rw.sig.hd  <- grep("sigma", res$node)
rw.rho     <- grep("rho", res$node)

## variables that need to be initialised
mu     <- rep(NA, 2)
beta   <- rep(NA, 20)
theta  <- rep(NA, 2)
alpha  <- rep(NA, 2)
sig.hd <- rep(NA, 2)
var.hd <- rep(NA, 2)
rho.hd <- NA
u      <- rep(NA, 2)

Simul <- function(DofParity, DofMilk, CalvDim, DofHerdPrev, CalvDay, pStyh, pNinh, nRep){

## Legendre polynomial for weight of milk
LPwom1 <- poly.legendre(x = min(DofMilk, 40), degree = 1, minx = 0, maxx = 40)[1]

## Legendre polynomials for parity
Par <- poly.legendre(x = min(DofParity, 10), degree = 2, minx = 1, maxx = 10)
LPar1 <- Par[1]
LPar2 <- Par[2]

## Legendre polynomials for days in milk on first milk recording
LDim <- poly.legendre(x = CalvDim, degree = 3, minx = 5, maxx = 35)
LDim1 <- LDim[1]
LDim2 <- LDim[2]
LDim3 <- LDim[3]

## Herd prevalence is centered on 0
Idh <- DofHerdPrev

## Day of the year mapped onto a 2 pi period
nDay <- 2 * pi * CalvDay / 366
## sine and cosine
coscDay <- cos(nDay)
sincDay <- sin(nDay)
cos2cDay <- cos(2*nDay)
sin2cDay <- sin(2*nDay)

## function to generate the prediction
Samp <- function(...){

for(niter in 1:nRep){

for(isg in 1:2){
sig.hd[isg] <- rnorm(n = 1, mean = res[rw.sig.hd[isg], 2], sd = res[rw.sig.hd[isg], 3])
}

## sampling from parent nodes
for(imu in 1:2){
mu[imu] <- rnorm(n = 1, mean = res[rw.mu[imu], 2], sd = res[rw.mu[imu], 3])
}

for(ibet in 1:20){
beta[ibet] <- rnorm(n = 1, mean = res[rw.beta[ibet], 2], sd = res[rw.beta[ibet], 3])
}

for(ithet in 1:4){
theta[ithet] <- rnorm(n = 1, mean = res[rw.theta[ithet], 2], sd = res[rw.theta[ithet], 3])
}

## Model
alpha[1] <-  rnorm(n = 1, mean = res[rw.mu[1], 2], sd = sig.hd[1])
alpha[2] <-  rnorm(n = 1, mean = res[rw.mu[2], 2], sd = sig.hd[2])

u[1] <- alpha[1] + theta[1] * Idh + theta[2] * pStyh
u[2] <- alpha[2] + theta[3] * (1 - Idh) + theta[4] * pNinh

Sty <- u[1] +
beta[1] * LPwom1 +
beta[2] * LPar1 +
beta[3] * LPar2 +
beta[4] * LDim1 +
beta[5] * LDim2 +
beta[6] * LDim3 +
beta[7] * coscDay +
beta[8] * sincDay +
beta[9] * cos2cDay +
beta[10] * sin2cDay

Ninf <- u[2] +
beta[11] * LPwom1 +
beta[12] * LPar1 +
beta[13] * LPar2 +
beta[14] * LDim1 +
beta[15] * LDim2 +
beta[16] * LDim3 +
beta[17] * coscDay +
beta[18] * sincDay +
beta[19] * cos2cDay +
beta[20] * sin2cDay

nsim <- matrix(c(1 - invlogit(Sty),
invlogit(Ninf)), nrow = 1)

if(exists("sim")) sim <- rbind(sim, nsim) else sim <- nsim

}

sim
}

z  <- Samp(nRep = nRep)
zz <- matrix(ncol = 2, nrow = nRep)
zz[,1] <- as.numeric(unlist(z[,1]))
zz[,2] <- as.numeric(unlist(z[,2]))
colnames(zz) <- c("Cur", "Ninf")

qtSim <- data.frame(Dim = CalvDim,
Parity = DofParity,
Milk = DofMilk,
HerdPrev = DofHerdPrev,
Day = CalvDay,
pStyh = pStyh,
pNinh = pNinh,
Cur2.5 = NA,
Cur50  =NA,
Cur97.5 = NA,
Ninf2.5 = NA,
Ninf50  =NA,
Ninf97.5 = NA)

qtl <- apply(zz, 2, quantile, c(.025, .5, .975))

qtSim[, grep("Cur", colnames(qtSim))] <- qtl[,"Cur"]
qtSim[, grep("Ninf", colnames(qtSim))] <- qtl[,"Ninf"]

if(exists("Sim")) Sim <- rbind(Sim, qtSim) else Sim <- qtSim

Sim

}

Correlation between new infection and cure rates

Samp.cor <- function(DofParity, DofMilk, CalvDim, DofHerdPrev, CalvDay, pStyh, pNinh, nRep){

## Legendre polynomial for weight of milk
LPwom1 <- poly.legendre(x = min(DofMilk, 40), degree = 1, minx = 0, maxx = 40)[1]

## Legendre polynomials for parity
Par <- poly.legendre(x = min(DofParity, 10), degree = 2, minx = 1, maxx = 10)
LPar1 <- Par[1]
LPar2 <- Par[2]

## Legendre polynomials for days in milk on first milk recording
LDim <- poly.legendre(x = CalvDim, degree = 3, minx = 5, maxx = 35)
LDim1 <- LDim[1]
LDim2 <- LDim[2]
LDim3 <- LDim[3]

## Herd prevalence is centered on 0
Idh <- DofHerdPrev

## Day of the year mapped onto a 2 pi period
nDay <- 2 * pi * CalvDay / 366
## sine and cosine
coscDay <- cos(nDay)
sincDay <- sin(nDay)
cos2cDay <- cos(2*nDay)
sin2cDay <- sin(2*nDay)

## function to generate the prediction
Samp <- function(...){

## variance covariance matrix
for(isg in 1:2){
sig.hd[isg] <- rnorm(n = 1, mean = res[rw.sig.hd[isg], 2], sd = res[rw.sig.hd[isg], 3])
}

rho.hd <- rnorm(n = 1, mean = res[rw.rho, 2], sd = res[rw.rho, 3])

var.hd[1] <- sig.hd[1] * sig.hd[1]
var.hd[2] <- sig.hd[2] * sig.hd[2]
cov.hd    <- rho.hd * sig.hd[1] * sig.hd[2]

SigMat <- matrix(c(var.hd[1], cov.hd, cov.hd, var.hd[2]), ncol = 2, byrow = TRUE)

## sampling from parent nodes
for(imu in 1:2){
mu[imu] <- rnorm(n = 1, mean = res[rw.mu[imu], 2], sd = res[rw.mu[imu], 3])
}

for(ibet in 1:20){
beta[ibet] <- rnorm(n = 1, mean = res[rw.beta[ibet], 2], sd = res[rw.beta[ibet], 3])
}

for(ithet in 1:4){
theta[ithet] <- rnorm(n = 1, mean = res[rw.theta[ithet], 2], sd = res[rw.theta[ithet], 3])
}

## Model
alpha <-  mvrnorm(n = 1, mu = mu, Sigma = SigMat)

u[1] <- alpha[1] + theta[1] * Idh + theta[2] * pStyh
u[2] <- alpha[2] + theta[3] * (1 - Idh) + theta[4] * pNinh

Sty <- u[1] +
beta[1] * LPwom1 +
beta[2] * LPar1 +
beta[3] * LPar2 +
beta[4] * LDim1 +
beta[5] * LDim2 +
beta[6] * LDim3 +
beta[7] * coscDay +
beta[8] * sincDay +
beta[9] * cos2cDay +
beta[10] * sin2cDay

Ninf <- u[2] +
beta[11] * LPwom1 +
beta[12] * LPar1 +
beta[13] * LPar2 +
beta[14] * LDim1 +
beta[15] * LDim2 +
beta[16] * LDim3 +
beta[17] * coscDay +
beta[18] * sincDay +
beta[19] * cos2cDay +
beta[20] * sin2cDay

nsim <- matrix(c(1 - invlogit(Sty),
invlogit(Ninf)), nrow = 1)

if(exists("sim")) sim <- rbind(sim, nsim) else sim <- nsim

sim
}

z <- as.data.frame(t(replicate(nRep, Samp(...))))
z[,1] <- unlist(z[,1])
z[,2] <- unlist(z[,2])
colnames(z) <- c("Cur", "Ninf")

z
}

Posted in Article | Tagged Dry period, Journal of dairy science, R code, Somatic Cell Count | Leave a Comment »

Modelling non linear associations using Legendre polynomials

May 8, 2012 by Aurélien

One way to model non linear associations is to use polynomials. However, this can be a problem since a polynomial different degrees are correlated. In R, it is possible to use the poly function in regression which, unless the raw argument is set to TRUE, uses orthogonal polynomials.

For our paper on the dynamics of milk somatic count over the dry period, the modelling was carried out in WinBUGS. Raw polynomials were producing autocorrelation so I needed to use orthognal polynomials. I thought of using the R poly function to calculate new predictors but did not manage to understand how it works. I needed to be able to predict the effects associated with the estimated coefficients easily, so I wanted the scaling of my predictors to be independent of the input vector. I have decided to use Legendre polynomials which are orthogonal and easy to calculate. Here is the function I used:

poly.legendre <- function(x, degree, minx, maxx){

if (min(x) < minx) stop("min(x) < minx")
if (max(x) > maxx) stop("max(x) > maxx")
if (degree > 4) stop("Degree must be < 5")

lg <- length(x)

z <- data.frame(x)

z$xP1 <- 2 * (x - minx) / (maxx - minx) - 1
z$xP2 <- (3 * z$xP1 * z$xP1 - 1) / 2
if(degree > 2) z$xP3 <- (5 * z$xP2 * z$xP1 - 2 * z$xP1) / 3
if(degree > 3) z$xP4 <- (7 * z$xP3 * z$xP1 - 3 * z$xP2) / 4

z  <- z[, -1]
}

We test it on a small toy example:

## Test data
test <- data.frame(
x = seq(0, 3, by = .1))

## Legendre polynomials
test <- cbind(
test,
poly.legendre(test$x, 4, 0, 3))

We check that this works when used in linear regression:

## Non linear association between y and x
test$y <- with(test, x - .5 * x^2 + .1 * x^3)

## Prediction from linear model using the R poly function
test$pred1 <- predict(lm(y ~ poly(x, 3), data = test))

## Prediction from linear model using Legendre polynomials
test$pred2 <- predict(lm(y ~ xP1 + xP2 + xP3, data = test))

## Plot of observed versus predicted
plot(y ~ x, data = test, type = "l")
lines(pred1 ~ x, data = test, lty = 2, col = "red")
lines(pred2 ~ x, data = test, lty = 3, col = "blue")

Both the R poly function and the Legendre polynomials function give a perfect prediction when used in our example.

Posted in Uncategorized | Tagged R code, Statistics, WinBUGS | 1 Comment »

R code: How to draw a circle

April 28, 2012 by Aurélien

I need to draw a circle on an existing map. In order to do that , I have written a function which takes the coordinates of the origin in x and y, and, the radius as arguments. There is also an add argument which specifies whether the circle should be added to the current plot.

circle <- function(xorig, yorig, radius, add, ...){

  x <- seq(-radius, radius, length.out = 1000)

  # Euclidian distance to the origin
  y <- sapply(x, function(z) sqrt(radius^2 - z^2))

  if(add == TRUE){

    lines(xorig + c(x, rev(x)), c(yorig + y, yorig + rev(-y)),
          type = "l", ...)

   } else {

   plot(xorig + c(x, rev(x)), c(yorig + y, yorig + rev(-y)),
        type = "l", ...)

   }
}

An example:

circle(0, 0, 1, add = FALSE,
       main = "A circle",
       xlab = "x",
       ylab = "y")

circle(.5, .5, .5, add = TRUE,
       col = "red")

Posted in Uncategorized | Tagged R code | 1 Comment »

R code: How to determine the median of a range stored as a character string

April 12, 2012 by Aurélien

I have character strings in which are stored ranges, for example 3-5, for which I need to calculate the median value. Here is how I do that:

test <- data.frame(
          range = c("3-5", "1-6", "8-9"))

test$med <- unlist(
             sapply(
              as.character(test$range), function(x){
                lapply(
                 strsplit(x, "-"), function(z) median(as.numeric(z)))
              }))

test
   range med
1   3-5 4.0
2   1-6 3.5
3   8-9 8.5

Posted in Uncategorized | Tagged R code | Leave a Comment »

A semi-parametric model for lactation curves

March 12, 2012 by Aurélien

Our article on lactation curves has just appeared in Preventive Veterinary Medicine:

We propose a semi-parametric model for lactation curves that, along with stage of lactation, accounts for day of the year at milk recording and stage of gestation. Lactation is described as having 3 different phases defined by 2 change points of which the second is a function of gestation stage. Season of milk recording is modelled using cosine and sine functions. As an application, the model is used to estimate the association between intramammary infections (IMI) dynamics as measured by somatic cell count (SCC) over the dry period and the shape of the lactation curve. Milk recording data collected in 2128 herds from England and Wales between 2004 and 2007 were used in the analysis. From a random sample of 1000 of these herds, smoothed milk production was used to test the behaviour of the model and estimate model parameters. The first change point was set at 60 days in milk. The second change point was set at 100 days of gestation or 200 days in milk when the latter was not available. Using data from the 1128 remaining herds, multilevel models were then used to model individual test-day milk production within lactations within herds. Average milk production at 60 days in milk for cows of parities 1, 2, 3 and greater than 3 were 26.9kg, 31.6kg, 34.4kg and 34.7kg respectively and, after this stage, decreases in milk production per 100 days milk of lactation were 3.1kg, 5.1kg, 6.3kg and 6.7kg respectively. Compared to cows that had an SCC below 200,000cells/mL on both the last milk recording in a lactation and the first milk recording in the following lactation, cows that had an SCC greater than 200,000cells/mL on their first milk recording after calving had an estimated loss of milk production of between 216 and 518kg depending on parity. These estimates demonstrate the impact of the dynamics of SCC during the dry period on milk production during the following lactation.

The basic idea was presented here. On this page you will find the R code that goes with the paper, apart from the association with stage of gestation which is accounted for in some of the models presented in the paper but not here.

Milk production, stage of lactation and time of the year

Below is an R function that generates lactation curves using the model presented in the paper.

###################################################################
### Function that generates predicted milk productions
### for a grid days in milk x day of the year
###################################################################

## The function takes the model's coefficients as arguments
## The 'plot' argument needs to be set to TRUE if a 3D surface
## is wanted - FALSE otherwise
PredMilk <- function(betas, plot){

tau1  <- 60
tau2  <- 200
ntau2 <- (tau2 - tau1) / 100

Dim <- 5:305
Doy <- 1:365

## Day of year mapped onto 2 period
X <- expand.grid(
Dim = Dim,
Doy = Doy)

X$Toy <- X$Doy / 365 * 2 * pi

X$Int     <- rep(1)
X$nDim    <- (X$Dim - tau1) / 100
X$I1nDim2 <- ifelse(X$Dim < tau1, X$nDim^2, 0)
X$I1nDim3 <- ifelse(X$Dim < tau1, X$nDim^3, 0)
X$I3fDim  <- ifelse(X$Dim >= tau2,
ntau2^2 - 2 * ntau2 * X$nDim + X$nDim^2, 0)

X$cosT <- cos(X$Toy)
X$sinT <- sin(X$Toy)

X$I1.nDim.cosT  <- ifelse(Dim < tau1, X$nDim * X$cosT, 0)
X$I1.nDim.sinT  <- ifelse(Dim < tau1, X$nDim * X$sinT, 0)
X$I23.nDim.cosT <- ifelse(Dim > tau1, X$nDim * X$cosT, 0)
X$I23.nDim.sinT <- ifelse(Dim > tau1, X$nDim * X$sinT, 0)

X$Milk <- as.matrix(X[,-(1:3)]) %*% betas

Z <- X[,c("Dim", "Doy", "Milk")]

Z <- reshape(Z, direction = "wide", idvar = "Dim", timevar = "Doy")
Z <- Z[,-1]

if(isTRUE(plot)){

x <- seq(5, 305, 5)
y <- seq(5, 305, 5)

par(mfrow = c(1, 2), mar = c(2, 2, 0, 2))
persp(x = x,
y = y,
z = as.matrix(Z[match(x, Dim), match(y, Doy)]),
theta = -60,
ticktype = "detailed",
xlab = "Day in milk",
ylab = "Day of year at recording",
zlab = "Milk (kg)")

persp(x = x,
y = y,
z = as.matrix(Z[match(x, Dim), match(y, Doy)]),
theta = 60,
ticktype = "detailed",
xlab = "Day in milk",
ylab = "Day of year at recording",
zlab = "Milk (kg)")

}

Milk <- list(Dim = Dim,
Doy = Doy,
Milk = Z)

}

Parity 1

Lactation curve from primiparous cows:

betas <- c(26.82, -3.081, 16.041, 77.301, -2.092,
 -.272, .992, -.755, 1.058, -.34, .059)

par(oma = c(0, 0, 2, 0))
PredMilk(betas = betas, plot = TRUE)
title("Parity 1", outer = TRUE)

Other parities

## Predicted milk production - Parity 2
betas <- c(31.561, -5.094, 16.041, 77.301, -3.148,
-.633, 1.208, -.755, 1.058, -.34, .059)

par(oma = c(0, 0, 2, 0))
PredMilk(betas = betas, plot = TRUE)
title("Parity 2", outer = TRUE)

## Predicted milk production - Parity 3
betas <- c(34.347, -6.282, 16.041, 77.301, -3.148,
-.633, 1.208, -.755, 1.058, -.34, .059)

par(oma = c(0, 0, 2, 0))
PredMilk(betas = betas, plot = TRUE)
title("Parity 3", outer = TRUE)

## Predicted milk production - Parity > 3
betas <- c(34.715, -6.708, 16.041, 77.301, -3.148,
-.633, 1.208, -.755, 1.058, -.34, .059)

par(oma = c(0, 0, 2, 0))
PredMilk(betas = betas, plot = TRUE)
title("Parity > 3", outer = TRUE)

Milk production and somatic cell count categories

Here is the code to generate the lactation curves for the different SCC categories:

###################################################################
### Function that plots milk production per day in milk
### Uses the predictions generated by the above function
###################################################################
plotMilk <- function(Betas, Thresh, Parit){

nmfig <- paste("T", Thresh, "_Par", Parit, ".jpg", sep ="")
nmcsv <- paste("T", Thresh, "_Par", Parit, ".csv", sep ="")

CumulProd <- data.frame(
catg = c("LL", "LH", "HL", "HH"),
cp = rep(NA, 4))

colr <- c("darkolivegreen", "lightblue", "orange", "red")

 lty <- c(1, 2, 1, 2)

for(i in 1:4){

Y <- PredMilk(betas = Betas[, i + 1], plot = FALSE)

plot(x = 5:305,
y = Y$Milk[,1],
type = "l", col = colr[i],
lty = lty[i], lwd = 2,
ylim = c(15, 40), ylab = "Milk (kg)",
xlab = "Day in milk")
par(new = TRUE)

CumulProd$cp[i] <- sum(Y$Milk[,1])

}
legend(200, 40,
c("Low-Low", "Low-High", "High-Low", "High-High"),
col = colr, bty = "n", lty = lty, lwd = 2)
title(paste("Threshold = ", Thresh," cells/mL - Parity = ", Parit,
sep = ""))

print(CumulProd)

}

Threshold = 200,000 cells/mL; parity 2

Betas <- data.frame(
coF = c("Int", "Dim", "I1Dim2", "I1Dim3", "I3fDim",
"cosT",
"sinT", "I1Dim.cosT", "I1Dim.sinT", "I23Dim.cosT",
"I23Dim.sinT"),
LL = c(32.74, -6.091, 6.478, 57.712, -.172, -.691, 1.605,
-1.328, .55, -.019, -.119),
LH = c(-.918, .292, -4.132, rep(0, 8)),
HL = c(.421, -.304, -1.882, rep(0, 8)),
HH = c(-.929, .186, -3.522, rep(0, 8)))

Betas$LH <- Betas$LL + Betas$LH
Betas$HL <- Betas$LL + Betas$HL
Betas$HH <- Betas$LL + Betas$HH

plotMilk(Betas = Betas, Thresh = 200, Parit = 2)

Threshold = 200,000 cells/mL; parity 3

Betas <- data.frame(
coF = c("Int", "Dim", "I1Dim2", "I1Dim3", "I3fDim",
"cosT",
"sinT", "I1Dim.cosT", "I1Dim.sinT", "I23Dim.cosT",
"I23Dim.sinT"),
LL = c(34.827, -6.869, 1.494, 54.95, -.187, -.603, 1.482,
-1.831, 1.217, -.252, -.056),
LH = c(-.892, .24, -3.473, rep(0, 8)),
HL = c(.752, -.496, -2.455, rep(0, 8)),
HH = c(-.693, -.143, -4.167, rep(0, 8)))

Betas$LH <- Betas$LL + Betas$LH
Betas$HL <- Betas$LL + Betas$HL
Betas$HH <- Betas$LL + Betas$HH

plotMilk(Betas = Betas, Thresh = 200, Parit = 3)

Threshold = 200,000 cells/mL; parity > 3

Betas <- data.frame(
coF = c("Int", "Dim", "I1Dim2", "I1Dim3", "I3fDim",
"cosT",
"sinT", "I1Dim.cosT", "I1Dim.sinT", "I23Dim.cosT",
"I23Dim.sinT"),
LL = c(35.17, -7.247, 3.741, 60.025, -.12, -.802, 1.304,
-1.972, 1.044, -.23, .061),
LH = c(-1.414, .396, -3.178, rep(0, 8)),
HL = c(.292, -.5471, -3.518, rep(0, 8)),
HH = c(-1.552, -.09, -4.37, rep(0, 8)))

Betas$LH <- Betas$LL + Betas$LH
Betas$HL <- Betas$LL + Betas$HL
Betas$HH <- Betas$LL + Betas$HH

plotMilk(Betas = Betas, Thresh = 200, Parit = 4)

Threshold = 100,000 cells/mL; parity 2

Betas <- data.frame(
coF = c("Int", "Dim", "I1Dim2", "I1Dim3", "I3fDim",
"cosT",
"sinT", "I1Dim.cosT", "I1Dim.sinT", "I23Dim.cosT",
"I23Dim.sinT"),
LL = c(32.593, -6.003, 7.824, 59.528, -.15, -.664, 1.624,
-1.287, .908, -.038, -.118),
LH = c(-.622, .218, -4.076, rep(0, 8)),
HL = c(.519, -.348, -1.054, rep(0, 8)),
HH = c(-.694, .049, -2.713, rep(0, 8)))

Betas$LH <- Betas$LL + Betas$LH
Betas$HL <- Betas$LL + Betas$HL
Betas$HH <- Betas$LL + Betas$HH

plotMilk(Betas = Betas, Thresh = 100, Parit = 2)

Threshold = 100,000 cells/mL; parity 3

Betas <- data.frame(
coF = c("Int", "Dim", "I1Dim2", "I1Dim3", "I3fDim",
"cosT",
"sinT", "I1Dim.cosT", "I1Dim.sinT", "I23Dim.cosT",
"I23Dim.sinT"),
LL = c(34.592, -6.651, 6.093, 61.814, -.178, -.7, 1.405,
-1.397, .966, -.187, -.025),
LH = c(-.768, .193, -2.988, rep(0, 8)),
HL = c(0.867, -.682, -2.761, rep(0, 8)),
HH = c(-.325, -.247, -4.451, rep(0, 8)))

Betas$LH <- Betas$LL + Betas$LH
Betas$HL <- Betas$LL + Betas$HL
Betas$HH <- Betas$LL + Betas$HH

plotMilk(Betas = Betas, Thresh = 100, Parit = 3)

Threshold = 100,000 cells/mL; parity > 3

Betas <- data.frame(
coF = c("Int", "Dim", "I1Dim2", "I1Dim3", "I3fDim",
"cosT",
"sinT", "I1Dim.cosT", "I1Dim.sinT", "I23Dim.cosT",
"I23Dim.sinT"),
LL = c(35.26, -7.145, 6.444, 64.402, -.082, -.745, 1.276,
-1.808, 1.017, -.294, .017),
LH = c(-1.145, .343, -3.289, rep(0, 8)),
HL = c(.435, -.572, -3.989, rep(0, 8)),
HH = c(-1.291, .007, -4.097, rep(0, 8)))

Betas$LH <- Betas$LL + Betas$LH
Betas$HL <- Betas$LL + Betas$HL
Betas$HH <- Betas$LL + Betas$HH

plotMilk(Betas = Betas, Thresh = 100, Parit = 4)

Posted in Article | Tagged Lactation curve, Mastitis, Poster, preventive veterinary medicine, R code, Somatic Cell Count | 1 Comment »

Older Posts »

Mieux Vaut Prévenir que Guérir!

Animal Health at the Population Level

R code: Equivalent of gsub for a vector of patterns

Milk recording

R code: Drawing an arrow using origin, length and angle

R code: How to count the number of occurrences of a substring within a string

R code: How to convert a list to a data.frame

Risk factors for a high somatic cell count at the first milk recording in a large sample of UK dairy herds

Functions

Model coefficients

Simulation of covariates effects on the probability of having an SCC > 200,000 cells/mL on the first milk recording following calving

Correlation between new infection and cure rates

Modelling non linear associations using Legendre polynomials

R code: How to draw a circle

R code: How to determine the median of a range stored as a character string

A semi-parametric model for lactation curves

Milk production, stage of lactation and time of the year

Parity 1

Other parities

Milk production and somatic cell count categories

Threshold = 200,000 cells/mL; parity 2

Threshold = 200,000 cells/mL; parity 3

Threshold = 200,000 cells/mL; parity > 3

Threshold = 100,000 cells/mL; parity 2

Threshold = 100,000 cells/mL; parity 3

Threshold = 100,000 cells/mL; parity > 3

Archives

Links

Tags