In today’s post, I will look at a new Naturemag climate reconstruction claiming unprecedentedness (h/t Bishop Hill): “Evolution of the Southern Annular Mode during the past millennium” (Abram et al Nature 2014, pdf). Unfortunately, it is marred by precisely the same sort of data mining and spurious multivariate methodology that has been repeatedly identified in Team paleoclimate studies.
The flawed reconstruction has been breathlessly characterized at the Conversation by Guy Williams, an Australian climate academic, as a demonstration that, rather than indicating lower climate sensitivity, the recent increase in Antarctic sea ice is further evidence that things are worse than we thought. Worse it seems than previously imagined even by Australian climate academics.
the apparent paradox of Antarctic sea ice is telling us that it [climate change] is real and that we are contributing to it. The Antarctic canary is alive, but its feathers are increasingly wind-ruffled.
A Quick Review of Multivariate Errors
Let me start by assuming that CA readers understand the basics of multivariate data mining. In an extreme case, if you do a multiple regression of a sine wave against a large enough network of white noise, you can achieve arbitrarily high correlations. (See an early CA post on this here discussing example from Phillips 1998.)
At the other extreme, if you really do have a network of proxies with a common signal, the signal is readily extracted through averaging without any ex post screening or correlation weighting with the target.
As discussed on many occasions, there are many seemingly “sensible” multivariate methods that produce spurious results when applied to modern trends. In our original articles on Mann et al 1998-1999, Ross and I observed that short-centered principal components on networks of red noise is strongly biased to the production of hockey sticks. A related effect is that screening large networks based on correlation to modern trends is also biased to the production of hockey sticks. This has been (more or less independently) observed at numerous climate blogs, but is little known in academic climate literature. (Ross and I noted the phenomenon in our 2009 PNAS comment on Mann et al 2008, citing an article by David Stockwell in an Australian mining newsletter, though the effect had been previously noted at CA and other blogs).
Weighting proxies by correlation to target temperature is the sort of thing that “makes sense” to climate academics, but is actually even worse than ex post correlation screening. It is equivalent to Partial Least Squares regression of the target against a network (e.g. here for a discussion). Any regression against a large number of predictors is vulnerable to overfitting, a phenomenon well understood with Ordinary Least Squares regression, but also applicable to Partial Least Squares regression. Hegerl et al 2007 (cited by Abram et al as an authority) explicitly weighted proxies by correlation to target temperature. See the CA post here for a comparison of methods.
If one unpacks the linear algebra of Mann et al 1998-1999, an enterprise thus far neglected in academic literature, one readily sees that its regression phase in the AD1400 and AD1000 steps boils down to weighting proxies by correlation to the target (see here) – this is different from the bias in the principal components step that has attracted more publicity.
At Climate Audit, I’ve consistently argued that relatively simple averaging can recover the “signal” from networks with a common signal (which, by definition “proxies” ought to have). I’ve argued in favor of working from large population networks of like proxies without ex post screening or ex post correlation weighting.
The Proxy Network of Abram et al 2014
Abram et al used a network of 25 proxies, some very short (5 begin only in the mid-19th century) with only 6 reaching back to AD1000, the start of their reconstruction. They calibrated this network to the target SAM index over a calibration period of 1957-1995 (39 years.)
The network consists of 14 South American tree ring chronologies, 1 South American lake pigment series, one ice core isotope series from the Antarctic Peninsula and 9 ice core isotope series from the Antarctic continent. The Antarctic and South American networks are both derived from the previous PAGES2K networks, using the subset of South American proxies located south of 30S. (This eliminates the Quelccaya proxies, both of which were used upside down in the PAGES2K South American reconstruction.)
Abram et al described their proxy selection as follows:
We also use temperature-sensitive proxy records for the Antarctic and South America continental regions [5 - PAGES2k] to capture the full mid-latitude to polar expression of the SAM across the Drake Passage transect. The annually resolved proxy data sets compiled as part of the PAGES2k database are published and publically available5. For the South American data set we restrict our use to records south of 30 S and we do not use the four shortest records that are derived from instrumental sources. Details of the individual records used here and their correlation with the SAM are given in Supplementary Table 1.
However, their network of 14 South American tree ring chronologies is actually the product of heavy prior screening of an ex ante network of 104 (!!) chronologies. (One of the ongoing methodological problems in this field is the failure of authors to properly account for prior screening and selection).
The PAGES2K South American network was contributed by Neukom, the co-lead author of Gergis et al 2012. Neukom’s multivariate work is an almost impenetrable maze of ex post screening and ex post correlation weighting. If Mannian statistics is Baroque, Neukom’s is Rococo. CA readers will recall that non-availability of data deselected by screening was an issue in Gergis et al. (CA readers will recall that David Karoly implausibly claimed that Neukom and Gergis “independently” discovered the screening error in Gergis et al 2012 on the same day that Jean S reported it at Climate Audit.) Although Neukom’s proxy network has become increasingly popular in multiproxy studies, I haven’t been able to parse his tree ring chronologies as Neukom has failed to archive much of the underlying data and refused to provide it when requested.
Neukom’s selection/screening of these 14 chronologies was done in Neukom et al 2011 (Clim Dyn) using a highly non-standard algorithm which rated thousands of combinations according to verification statistics. While not a regression method per se, it is an ex post method and, if eventually parsed, will be subject to similar considerations as regression method – the balloon is still being squeezed.
The Multivariate Methodology of Abram et al 2014
Abram et al used a methodology equivalent to the regression methodology of the AD1400 and AD1000 steps of Mann et al 1998-1999 – a methodology later used (unaware) in Hegerl et al 2007, who are cited by Abram et al.
In this methodology, proxies are weighted by their correlation coefficient with the resulting composite scaled to the target. Abram et al 2014 described their multivariate method as follows (BTW “CPS” normally refers to unweighted composites):
We employ the widely used composite plus scale (CPS) methodology [5- PAGES2K,11 - Jones et al 2009, 12 - Hegerl et al 2007] with nesting to account for the varying length of proxies making up the reconstruction. For each nest the contributing proxies were normalized relative to the AD 1957-1995 calibration interval…
The normalized proxy records were then combined with a weighting [12- Hegerl et al 2007] based on their correlation coefficient (r) with the SAM during the calibration interval (Supplementary Table 1). The combined record was then scaled to match the mean and standard deviation of the instrumental SAM index during the calibration interval. Finally, nests were spliced together to provide the full 1,008-year SAM reconstruction.
Although Abram et al (and their reviewers) were apparently unaware, this methodology is formally equivalent to MBH99 regression methodology and to Partial Least Squares regression. Right away, one can see potential calibration period overfitting perils when one is using a network of 25 proxies to fit over a calibration period of only 29 years. Such overfitting is particularly bad when proxies are flipped over (see another old CA post here – I am unaware of anything equivalent in academic climate literature).
The Abram/PAGES2K South American Tree Ring Network
The Abram/PAGES2K South American tree ring network is an almost classic example of what not to do. Below is an excerpt from their Supplementary Table 1 listing their South American proxies, together with their correlation (r) to the target SAM index and the supposed “probability” of the correlation:
Right away you should be able to see the absurdity of this table. The average correlation of chronologies in the tree ring network to the target SAM index is a Mannian -0.01, with correlations ranging from -0.289 to +0.184.
Thare’s an irony to the average correlation being so low. Villalba et al 2012, also in Nature Geoscience, also considered a large network of Patagonian tree ring chronologies (many of which were identical to Neukom et al 2011 sites), showing a very noticeable decline in ring widths over the 20th century (with declining precipitation) and a significant negative correlation to Southern Annular Mode (specifically discussed in the article). It appears to me that Neukom’s prior screening of South American tree ring chronologies according to temperature (reducing the network from 104 to 14) made the network much less suitable for reconstruction of Southern Annular Mode (which is almost certainly more clearly reflected in precipitation proxies.)
The distribution of correlation coefficients in Abram et al is inconsistent with the network being a network of proxies for SAM. Instead of an average correlation of ~0, a network of actual proxies should have a significant positive (or negative) correlation, and, in a “good” network of proxies of the same type (e.g. Patagonian tree ring chronologies), all correlations will have the same sign.
Nonetheless, Abram et al claim that chronologies with the most extreme correlation coefficients within the network (both positive and negative) are also the most “significant” (as measured by their p-value.) They obtained this perverse result as follows: the “significance” of their correlations “were assessed relative to 10000 simulations on synthetic noise series with the same power spectrum as the real data [31 - Ebisuzaki, J. Clim 1997]“. Thus both upward-trending and downward-trending series were assessed as more “significant” within the population of tree ring chronologies and given higher weighting in the reconstruction.
The statistical reference of Abram et al was designed for a different problem. Their calculations of significance are done incorrectly. Neither their network of tree ring chronologies nor their multivariate method is suitable for their task. The coefficients clearly show the unsuitability.
A reconstruction using the methods of Abram et al 2014, especially accumulating the previous screening of Neukom et al 2011, is completely worthless for estimating prior Southern Annular Mode. This is different from being “WRONG!”, the adjective that is too quickly invoked in some skeptic commentary.
Despite my criticism, I think that proxies along the longitudinal transect of South America are extremely important and that the BAS Antarctic Peninsula ice core isotope series from James Ross Island is of great importance (and that it meets virtually all CA criteria for an ex ante “good” proxy.)
However, Abram et al is about as far from a satisfactory analysis of such proxies as one can imagine. It is too bad that Naturemag appears unequal to identifying even elementary methodological errors in articles that claim unprecedentedness. Perhaps they should reflect on their choice of peer reviewers for paleoclimate articles.