Guest post by Willis Eschenbach (@weschenbach on eX-Twitter)
Our venerable moderator Charles asked me to take a look at a new paper yclept, and he has made WUWT a popular article for which I am eternally grateful. Multivariate analysis rejects theory of anthropogenic increase in atmospheric carbon dioxide: sea surface temperature rule Author: Dai Ato, Japanese independent researcher. It seems that there is a trick. I call this paper Ato2024.
I hadn't gone very far when the alarm went off. The study used publicly available data to conduct a multivariate analysis to examine the impact of sea surface temperature (SST) and human emissions on atmospheric carbon dioxide levels.
It is concluded that sea surface temperature is an independent determinant of the annual increase in atmospheric carbon dioxide concentration. Human emissions were found to be unrelated to the regression model.
Most tellingly, it says:
In addition, the atmospheric CO2 concentration predicted using the UK-HADLEY center's sea temperature regression equation after 1960 has an extremely high correlation with the actual CO2 concentration (Pearson correlation coefficient r = 0.9995, P < 3e-92).
Buzz! Whenever I get an r value that high, I know I'm doing something very wrong…I'll come back to this.
First, let me start with one of the three variables in their analysis, sea surface temperature, carbon dioxide and emissions. Below are three reconstructions of sea surface temperatures since 1854 by three different groups.
Figure 1. Global monthly mean sea surface temperature (SST). The yellow area on the right is the recording part of Ato analysis.
While there are some differences, the overall pattern is clear. Sea surface temperatures warmed from about 1850 to about 1870, cooled to about 1910, warmed to about 1940, cooled to about 1965, and warmed thereafter.
Seeing this, I can see why Atto didn't want to use the full record – it doesn't support his claim that sea surface temperature is an independent determinant of atmospheric carbon dioxide levels. The CO2 data (Figures 2 and 3 below) look nothing like this.
So how does he justify the hiatus? Well, the Mauna Loa CO2 measurements start around 1960. It looks like this.
Figure 2. Background atmospheric carbon dioxide levels from Mauna Loa and ice core measurements from AD 1000 to 2010. Data: Ice Core Changshan
Ato2024 indicates that ice core records are inaccurate. However, as shown above, the close agreement between ice core records and with Mauna Loa measurements obscures this.
Below is a detailed view of the latest data since 1850, corresponding to the time range of sea surface temperature (SST) in Figure 1.
Figure 3. As shown in Figure 2, but only includes data after 1850
Since the ice cores agree well with each other and with the Mauna Loa data, I don't see any problem with considering this a good reconstruction of post-1850 CO2 levels.
The problem, of course, is that pre-1960 ocean temperatures look completely different than pre-1960 CO2 levels…and this divergence completely misrepresents Ato2024. So he had to ignore it.
Next, how did he get such a large correlation (0.9995) between SST and CO2 in the data after 1960? Part of the answer lies in what he saw. This is the post-1960 Mauna Loa carbon dioxide emissions record he used. Note that he did not use monthly data, only annual data. It is easier to obtain a higher Pearson correlation coefficient “r”.
Figure 4. Mauna Loa Observatory CO2 observations and linear trend line.
The recent increase in carbon dioxide has been a very slowly accelerating curve, almost a straight line. This leads to many false correlations because such curves are easily reproduced As we will see below. This is a recurring question in climate science.
But this is only the first problem. The main problem is the programming he uses. This is the description in the paper.
Note that the symbol delta (Δ) in the equation means “change”. Therefore, ΔCO2 is the change in carbon dioxide from one year to the next.
Translated it is:
- Calculate the best-fit linear estimate of the annual change in CO2 (ΔCO2) based on the Hadley HadSST sea surface temperature.
- Projected atmospheric CO2 is the cumulative sum of starting atmospheric CO2 plus the estimated annual change in CO2.
This is a graph from the first part of the calculation, fitting sea surface temperature to annual changes in carbon dioxide.
Figure 5. Annual changes in atmospheric CO2 (ΔCO2) after 1960, and linear trend lines for ΔCO2, and best estimates of ΔCO2 based on Hadley HadSST4.0.1.
Now, there's a weird thing about plotting Delta CO2, or any delta for that matter. It involves some strange changes. I'll use the ΔCO2 graph shown in Figure 5 as an example.
First, any overall linear trend in the CO2 profile is translated into an overall shift from zero (non-zero mean) in the ΔCO2 plot.
Second, any overall acceleration in the CO2 profile translates into an overall linear trend in the ΔCO2 plot.
Therefore, it can be seen from Figure 5 that the ΔCO2 data has both a positive trend and an accelerating trend. We can see both of these in Figure 3 above.
Now that we have fitted the SST to the ΔCO2 data so that we can estimate ΔCO2, we can simply sum these changes to estimate the underlying CO2 data. This is the result.
Figure 6. Mauna Loa CO2 data and Ato2024 Mauna Loa CO2 data estimation
So far, I have replicated his results.
Now, remember I said that a correlation coefficient of 0.999+ indicates some fatal flaw in the logic. So…what's not to like?
In his notes asking me to read this paper, the host Charles conducted an interesting artificial intelligence analysis of this paper, namely (the emphasis is mine):
According to my analysis of this paper, the key question is circular reasoning appears to be a method for predicting atmospheric carbon dioxide concentrations based on sea surface temperature (SST) data. Specifically:
• The authors used multiple linear regression to derive an equation for annual CO2 increments versus sea surface temperature for the period 1960-2022.
• This equation is then used to “predict” carbon dioxide concentrations for the same period from 1960-2022.
An extremely high correlation (r = 0.9995) was found between predicted and measured CO2 concentrations.
Circular reasoning occurs because the same data are used to derive the equation and test its predictive power. The key equations involved are:
Regression equation (from step 7 in the paper):
Annual CO2 increase = 2.006 × HAD-SST + 1.143 (after 1959)
Prediction equation:
[CO2]n=S[ΔCO2]I+Cst
Where [CO2]n is the predicted CO2 concentration, [ΔCO2]i is the annual increment calculated according to the regression equation, and Cst is the actual CO2 concentration in the starting year.
By using this method, the authors essentially fit an equation to the data and then use the same fitted equation to “predict” the resulting data. This guarantees extremely high correlations but does not actually demonstrate any predictive power or causality.
Proper analysis will use separate training and test data sets, or employ techniques such as cross-validation to avoid this cycle.
The extremely high correlation reported is almost certainly an artifact of this flawed methodology rather than evidence of a true relationship between sea surface temperatures and atmospheric carbon dioxide levels.
Artificial intelligence is right. Well, partially true. They're right, the problem isn't that Ato2024 matches sea surface temperatures to carbon dioxide. The problem is that Ato2024 didn't withhold half of the data to verify the results. It's easy to predict something when you already know the outcome…
However, this is a big problem… While this problem alone is enough to completely falsify the conclusion, there is another very big problem. To illustrate this, I used the Ato2024 method. But I did not use sea surface temperature as the input to fit the ΔCO2 data like Ato2024, but fitted a straight line to the ΔCO2 data. This is the blue line in Figure 3 above.
Using the Ato2024 method, I have converted this straight line into equivalent CO2 data, as shown in red in Figure 7 below.
Figure 7. As shown in Figure 6, plus a red line, showing the results of using a simple straight line instead of sea surface temperature (SST) from Ato2024.
Interesting. Use the Ato2024 method to fit the variables to ΔCO2, Straight lines as input are just as good as using SST as input.
But that doesn't really show the full scope of the problem. To do this, I first split the SST, straight line and ΔCO2 data in half. I use the first half to fit SST or a straight line to ΔCO2. I then used these results to estimate changes in carbon dioxide. Figure 8 shows the results.
Figure 8.
This diagram reveals two separate issues. First, while the fit is much worse than Figure 6, the Pearson correlation coefficient “r” is essentially unchanged… which means it is not an appropriate metric for this particular problem.
Next, the straight line continues to perform just as well as using SST as the independent variable…without. This shows a serious problem with the underlying Ato method.
To illustrate the problem, I will re-show Figure 5 above.
To review, first, any overall linear trend in the CO2 profile translates into an overall shift in the ΔCO2 plot relative to zero (non-zero mean).
Second, any overall acceleration in the CO2 profile translates into an overall linear trend in the ΔCO2 plot.
This is the key. When you fit SST data (or more importantly, any data) to ΔCO2 data, You will end up with a fitted signal that has the same non-zero mean and the same trend as the ΔCO2 data.
Not only that, but the fit is also balanced, with equal amounts above and below the trend line.
All of this guarantees that if you start trying to predict a smooth curve, when you use Ato2024's method to reconstruct the signal, you will get an answer that is very close to a smooth curve No matter what variables you use to reconstruct the signal.
This is why using a straight line works just as well as using sea surface temperature or any other variable as the basis for CO2 estimates.
I cry over the death of honest peer review…
My best to everyone,
w.
Yes, you've heard it before: When you leave a comment, please quote the exact words you are discussing. This avoids endless misunderstandings.
Relevant