The Southern Oscillation Index (SOI) is a measure of the standardized sea level pressure differences between Tahiti & Darwin, Australia collected Monthly since 1951.
Negative valus represent below-normal air pressure in Tahiti and above in Darwin while Positive values are above-normal in Tahiti and below in Darwin
Prolonged periods of negative values coincide with warm ocean waters in the Easten Pacific tropics (El Nino) and prolonged positve periods correspond to La Nina
El Nino generally brings * colder & wetter weather to Southern U.S * Warmer weather to Western Canada & Southern Alaska * drier weather to Pacific northwest * cooler weather to Northern Canada * wetter weather to Southern California * Warmer & less snow in North U.S
La Nina generally brings * cooler & wetter to Northern U.S. (polar vortex)
Let’s take a look at the series
tmp <- read.csv("SOI.csv")
soi <- ts(tmp[,2], start=c(1951,1), frequency=12)
plot(soi, main="Southern Oscillation Index")
abline(h=0, col="gray50")
Now let’s take a look at the series decomposed.
plot(stl(soi, s.window="periodic") )
So from the plot we see that the series has been decomposed into a trend, seasonal and random parts. However, think back to Wednesday’s class, an algorithm performs this analysis. It is now up to us, the statisticians, to analyze and decide what is going on. Let’s look at the decomposition a little closer.
soi.stl <- stl(soi, s.window="periodic")
plot(soi.stl$time.series[,1]) # Plot the seasons
Pay attention to the y-axis in the plot. The seasonal effects range from -0.15 to 0.15 (roughly). The raw data tends to range in the +/- 2.5 (with some jumps bigger than that). Think some simple arithmetic, the seasonal effects range roughly 0.30 units compared to an overall range of 6. Some not-sophisticated math: 0.30/6 = 0.05. So the seasonal effects are only explaining 5% of the variation we see; not much of an influence. The SOI series not appear to have (much of) a seasonal component (at least deterministically).
Let’s make another plot of the SOI to help look for El Nino and La Nina.
colors <- rep(0, length(soi))
colors[soi<0] <- "blue"
colors[soi>0] <- "red"
plot(time(soi), as.vector(soi), main="Southern Oscillation Index",
sub="El Nino or La Nina", col=colors, type='h')
abline(h=0, col="gray2")
First is the basic time series plot in ggplot2.
library(ggplot2)
neg.soi <- soi<0
soi.data <- data.frame(Time=time(soi), SOI=as.vector(soi), neg.soi=neg.soi)
ggplot(soi.data, aes(x=Time, y=SOI)) + geom_line();
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous
Now let’s try the colorized version, two versions
ggplot(soi.data) +
geom_segment(aes(x=Time, xend=Time, y=0, yend=SOI, color=SOI)) +
scale_color_gradient2(low="blue",high="red", midpoint=0) +
theme_bw();
## Warning: Non Lab interpolation is deprecated
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous
ggplot(soi.data) +
geom_segment(aes(x=Time, xend=Time, y=0, yend=SOI, color=neg.soi)) +
scale_color_manual(values=c("red", "blue")) +
theme_bw();
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous