In order to uncover relationships between variables without having to resort to complicated models, it can be interesting to smooth your data. Several techniques are available from the very simple moving averages to the more complicated generalized additive models. When I have to this, I have come to like local regression as implemented in the R loess function.
To illustrate how it works, I generate some data from a non linear function for the mean and from a normal distribution for the random noise.
set.seed(824) f1 <- function(x){ y <- x + -2 * x^2 + 1.5 * x^3 + rnorm(1, mean = 0, sd = .03) } myDat <- data.frame( x = seq(0, 1, by = .01), y = rep(NA)) myDat$y <- sapply(myDat$x, f1)
The data are shown in the figure below:
Using loess is really simple. The syntax is the same as for other models. The degree of smoothness is controlled by the span parameter of the function. By using predict either on the original data or a vector (or grid) of generated data, it is possible to obtain a smoothed curve. The following code shows how to use loess on our data. Four values are tested for span and smoothed curves are plotted against the original data.
limy <- c(0, .6) spns <- c(.05, .1, .75, 2) par(mfrow = c(2, 2)) # Loop over the 4 span values for(i in 1:4){ # Loess model myMod <- loess(y ~ x, data = myDat, span = spns[i]) # The response is predicted back myDat$pred <- predict(myMod, data = myDat) # Plots of smoothed data plot(y ~ x, data = myDat, pch = 20, main = paste("Span =", spns[i]), ylim = limy) par(new = TRUE) plot(pred ~ x, data = myDat, type = "l", col = "red", lwd = 2, ylim = limy, ylab = "") }
The figure below shows the different degrees of smoothness obtained for different values of span.
Leave a comment