Critics often complain that the three major surface temperature records — NASA’s GISTemp, the University of East Anglia’s HadCRUT, and NOAA’s National Climate Data Center record — all rely on most of the same underlying station data, provided through the Global Historical Climatology Network (GHCN). And therefore that they’re all dependent on, and vulnerable to, shortcomings in GHCN.
GHCN is comprised of around 7,000 station records at 4,500 different locations. The station records span the period from 1700 to present. GHCN contains a well-enough distributed sample of stations after 1880 to allow a reconstruction of global temperature.
Recent work by an amateur science blogger and software engineer named Ron Broberg has dramatically expanded the number of stations available. Broberg’s work involved parsing daily temperature data from 10,000 additional stations in NOAA’s Global Summary of Day (GSOD) network into a form readily usable for climate analysis.
The data can be used to fill in some of the regional gaps in GHCN that have cropped up in recent years as the number of stations available decreased. It can also be used as an independent check that temperature reconstructions produced using GHCN data are in-line with raw data from other stations.
Spurred by Citizen Scientist Suggestion
Broberg was inspired to undertake the project by a remark made by climate scientist Gavin Schmidt of the NASA Goddard Institute of Space Studies. Schmidt remarked that:
If people were looking for a “citizen science” project to work on, coming up with a way for the SYNOP data (available via WeatherUnderground etc.) to be made commensurate with the CLIMAT data (available via GHCN), would be a great one. There are some subtleties involved (definitions of daily and monthly means vary among providers), but that would provide an interesting back-up and comparison to the CLIMAT-derived summaries from GISTEMP, HadCRU or NCDC.
GSOD is a compilation of SYNOP data available from 1929 to present, though coverage is quite poor prior to 1973. After 1973, however, GSOD contains many times more station data than does GHCN, and the GSOD data is reasonably well distributed across the globe, as illustrated by the graphic below.
GSOD data can be used to reconstruct global and regional temperatures and compare the results to traditional GHCN-based reconstructions. The figure below shows the number of stations globally available in GSOD and GHCN for each year:
The number of GSOD stations available significantly exceeds GHCN stations after 1992, when GHCN last undertook a major retroactive compilation of station records (It is worth noting that GHCN version 3.1, currently in the works, will update the records to include data up to the present for many of these stations). Scientists can use GSOD as a further demonstration that the “station dropoff” in GHCN has had no major effect on the temperature record.
The chart below shows global temperature reconstructions using GHCN data and GSOD data post-1973, when the number of stations available in GSOD became comparable to GHCN. While the GSOD record runs slightly colder than GHCN, the results are quite similar.
The scientific community also can also look at GSOD data in regions where GHCN may have limited coverage. In the Arctic, for example, GSOD has had three to four times as many stations available over the past 30 years. The figure below compares both the temperature record and the number of stations available for the Arctic in both GHCN and GSOD:
Researchers and science bloggers are just starting to see what they can do with these alternative datasets, and initial results should be interpreted with great caution until more work has been done to ensure that GSOD stations selected meet the same level of quality control that has been applied to GHCN.
That said, the initial results show that temperatures independently reconstructed from raw GSOD data are closely in-line with the major GHCN-based land temperature series. If that remains the case over time, it might still some of the concerns critics have expressed that over-reliance on the same GHCN underlying station data is a major concern.