A good example of the sort of problems that can arise is provided by the hourly mean data from the Eskdalemuir station. This observatory has operated continuously since 1911, when it was established by Kew observatory on a rural and exceptionally clean magnetic site when the Kew site was rendered too noisy by the introduction of trams into west London (Harrison, 2004). There was a discontinuity at 1932 in the commonly-used set of hourly mean data from this station, which had remained un-noticed until 2004, when Mursula et al. (2004) and Clilverd et al. (2005) analyzed the inter-hour variability of Eskdalemuir data and found very small values in the early part of the 20th century. Detective work by Leif Svalgaard established that prior to 1932 the data stored in the Word Data Centre (WDC) system were 2-hour running means of the data recorded in the observatory yearbook. Such smoothing greatly influences inter-hour indices. MacMillan and Clarke (2011) have confirmed that this was indeed the case and digitised the data from the yearbook, so that all data from Eskdalemuir now available from WDC-C1 are hourly means with no running mean smoothing applied. (Users should check which dataset they are using because one problem with data that has been corrupted or massaged is that it is very hard to expunge from all datasets and bad data tends to resurface). It is not known how, when, where, or why this post-processing was carried out because the available metadata did not tell us the full provenance of the data. Presumably somebody, somewhere had believed that the noise suppression obtained by implementing a running mean was a good thing. If one used daily means of the (supposed) hourly data there would have be a some effect (as an hour of data from both the day before and the day after would be averaged in with half weight), but it would be small and the effect would be negligible on annual means. It is fair to assume that whoever implemented the smoothing never envisaged the use of the data to generate an inter-hour variability index. This example illustrates very graphically the great importance of knowing, as far as is possible, the true provenance of historic data and of all the corrections and changes that may have subsequently been applied to them. Lockwood et al. (2013a) have revealed a similar issue with data from Ekaterinburg by implementing an inter-correlation of hourly mean data from a given station at different UTs as a check of data consistency: they found very high correlations around 1900, revealing that interpolation to hourly values from more sparse data had taken place.
This is a vitally important concern for reconstruction work: being overly ready to accept an adjustment is highly irresponsible as it could deny future generations of scientists the opportunity to properly exploit the data or, in a worst case scenario, seriously mislead them (Council of AGU, 2009; Vogel, 1998).