More Ocean-sized Errors in Levitus et al. | Watts Up With That?

by Willis Eschenbach

Previously, we discussed the errors in Levitus et al here in An Ocean of Overconfidence

Unfortunately, the supplemental information for the new Levitus et al. paper has not been published. Fortunately, WUWT regular P. Solar has located a version of the preprint containing their error estimate, located here. This is how they describe the start of the procedure they describe which results in their estimates:

From every observed one-degree mean temperature value at every standard depth level we subtract off a climatological value. For this purpose we use the monthly climatological fields of temperature from Locarnini et a. [2010].

Now, the “climatology” means the long-term average (mean) of the variable. In this case, it is the long-term average for each 1° X 1° gridcell, at each depth. Being a skeptical type of fellow, I though “how much data do they actually have”? It is important because if they don’t have much data, the long-term mean will have a large error component. If we don’t have much data, it increases the size of the expected error in the mean, which is called the “standard error of the mean”.

Regarding the climatology, they say that it is from the World Ocean Atlas 2009 (WOA09), viz: ” … statistics at all standard levels and various climatological averaging periods are available at http://www.nodc.noaa.gov/OC5/WOA09F/pr_woa09f.html

So I went there to see what kind of numbers they have for the monthly climatology at 2000 metres depth … and I got this answer:

The temperature monthly climatologies deeper than 1500 meters have not been calculated.

Well, that sux. How do the authors deal with that? I don’t have a clue. Frustrated at 2000 metres, I figured I’d get the data for the standard error of the mean (SEM) for some month, say January, at 1500 metres. Figure 1 shows their map of the January SEM at 1500 metres depth:

Figure 1. Standard error of the mean (SEM) for the month of January at 1500 metres depth. White areas have no data. Click on image for larger version. SOURCE

YIKES! In 55 years, only 5% of the 1° X 1° gridcells have three observations or more for January at 1500 metre … and they are calculating averages?

 

Now, statistically cautious folks like myself would look at that and say “Well … with only 5% coverage, there’s not much hope of getting an accurate average”. But that’s why we’re not AGW supporters. The authors, on the other hand, forge on.

Not having climatological data for 95% of the ocean at 1500 metres, what they do is take an average of the surrounding region, and then use that value. However, with only 5% of the gridcells having 3 observations or more, that procedure seems … well, wildly optimistic. It might be useful for infilling if we were missing say 5% of the observations … but when we are missing 95% of the ocean, that just seems goofy.

So how about at the other end of the depth scale? Things are better at the surface, but not great. Here’s that map:

Figure 2. Standard error of the mean (SEM) for the month of January at the surface. White areas have no data. Click on image for larger version. Source as in Fig. 1

As you can see, there are still lots and lots of areas without enough January observations to calculate a standard error of the mean … and in addition, for those that do have enough data, the SEM is often  greater than half a degree. When you take a very accurate temperature measurement, and you subtract from it a climatology with a ± half a degree error, you are greatly reducing the precision of the results.

w.

APPENDIX 1: the data for this analysis was downloaded as an NCDF file  from here (WARNING-570 Mb FILE!). It is divided into 1° gridcells and has 24 depth levels, with a maximum depth of 1500 metres. It shows that some 42% of the gridcell/depth/month combinations have no data. Another 17% have only one observation for the given gridcell and depth, and 9% have two observations. In other words, the median number of observations for a given month, depth, and gridcell is 1 …

APPENDIX 2: the code used to analyze the data (in the computer language “R”) is:

require(ncdf)

mync=open.ncdf("temperature_monthly_1deg.nc")

mytemps=get.var.ncdf(mync,"t_gp")

tempcount=get.var.ncdf(mync,"t_dd")

myse=get.var.ncdf(mync,"t_se")

allcells=length(which(tempcount!=-2147483647))

zerocells=length(which(tempcount==2))

zerocells/allcells

hist(tempcount[which(tempcount!=-2147483647)],breaks=seq(0,6000,1),xlim=c(0,40))

tempcount[which(tempcount==-2147483647)]=NA

whichdepth=24

zerodata=length(which(tempcount[,, whichdepth,1]==0))

totaldata=length(which(!is.na(tempcount[,, whichdepth,1])))

under3data=length(which(tempcount[,, whichdepth,1] < 3))

length(tempcount[,, whichdepth,1])

1-under3data/totaldata

APPENDIX 3: A statistical oddity. In the course of doing this, I got to wondering about how accurate the calculation of the standard error of the mean (SEM) might be when the sample size is small. It’s important since so many of the gridcell/depth/month combinations have only a few observations. The normal calculation of the SEM is the standard deviation divided by the square root of N, sample size.

I did an analysis of the question, and I found out that as the number of samples N decreases, the normal calculation of the SEM progressively underestimates the SEM more and more. At a maximum, if there are only three data points in the sample, which is the case for much of the WOA09 monthly climatology, the SEM calculation underestimates the actual standard error of the mean by about 12%. This doesn’t sound like a lot, but it means that instead of 95% of the data being within the 95% confidence interval of 1.96 * SEM of the true value, only about 80% of the data is in the 95% confidence interval.

Further analysis shows that the standard calculation of the SEM needs to be multiplied by

0.43 N -1.2

to be approximately correct, where N is the sample size.

I also tried using [standard deviation divided by sqrt (N-1)] to calculate the SEM, but that consistently overestimated the SEM at small sample sizes

The code for this investigation was:

sem=function(x) sd(x,na.rm=T)/sqrt(length(x))

# or, alternate sem function using N-1

# sem=function(x) sd(x,na.rm=T)/sqrt(length(x) - 1)

nobs=30000 #number of trials

sample=5 # sample size

ansbox=rep(NA,20)

for (sample in 3:20){

    mybox=matrix(rnorm(nobs*sample),sample)

    themeans=apply(mybox,2,mean)

    thesems=apply(mybox,2,sem)

    ansbox[sample]=round(sd(themeans)/mean(thesems)-1,3)}

 

See more

Advertisements
Grist

A nonprofit news org for people who want a planet that doesn’t burn and a future that doesn’t suck.

Power For USA

Energy facts from oil to electricity

My Planet Earth

Creating a Healthy Planet

Planet Earth Weekly

Climate Change and Renewable Energy: Saving Our Planet for Future Generations

Follow The Money

"It has to start somewhere. It has to start sometime. What better place than here? What better time than now?"

Pedal and Plow

Cycling across South America to to discover what agriculture can be

BE CURIOUS

The important thing is not to stop questioning. Curiosity has it's own reason for existing.

patricktsudlow

Patrick Sudlow's blog

The Common Constitutionalist - Let The Truth Be Known

Politics, current events, human interest & some humor

Midwest Naturalist

Living in harmony with our creator, his creation and all living things.

You Evolving

Science, Adventure, Philosophy, Personal Evolution

Road To Abundance

The Earth Is Full and There is More Than Enough to Spare

Seaspout

Ocean News & Views

Coal Action Network Aotearoa

Keep the Coal in the Hole!

Precarious Climate

A call for urgent action on climate change

The GOLDEN RULE

“During times of universal deceit, telling the truth becomes a revolutionary act” – George Orwell

Earth Report

Global Disaster Watch - An Overview

TheSurvivalPlaceBlog

Surviving The World As We Know It

manchester climate monthly

To inform, inspire and involve

DeepResource

Observing the world of renewable energy and sustainable living

%d bloggers like this: