REOS Model Evaluation

Model evaluations involve direct comparison of simulations with analyzed data as well as with derived products associated with specific processes. For example, comparisons of simulated T-S profiles to data are far more relevant when extended to water mass analysis, tracer transports, etc. For climate variations such as ENSO, in addition to determining if the simulated SST variability is reasonable, it is important to evaluate how well particular processes are represented or parameterized, such as equatorial wave propagation, thermocline structure, vertical mixing, etc. Suites evaluations are not as simple as direct comparisons with data, but they are generally critical for the purpose of establishing the physical integrity of a model simulation.

Evaluation of Multi-Centennial Ocean-Ice (CORE I) Simulations

The comprehensive suite of diagnostics used to evaluate CORE I simulations are presented here as a WGOMD guideline for how to assess global ocean and ice simulations using observational datasets (Griffies et al. (2009)). The choice of diagnostics and observations is systematically discussed in the paper.

1/ Globally averaged ocean temperature and salinity with time. Assuming no internal sources or sinks, this depends on air-sea and ice-sea exchange and is a diagnostic for model drift.

2/ Horizontally averaged temperature and salinity with time. Annual mean anomalies relative to observations (World Ocean Database 2005 (Boyer et al., 2006), PHC Global Ocean Hydrography (Steele et al., 2001) show the evolution of biases throughout water column.

3/ Global maps of SST and SSS bias. Annomalies relative to observations (World Ocean Database 2005 (Boyer et al., 2006), PHC Global Ocean Hydrography (Steele et al., 2001) for the Arctic).

4/ Annual cycle of heat content (vertically integrated temperature) over the upper 250m and SST at Ocean Weather Ship Echo (in the subtropical Atlantic at 35N, 48W), compared to observations (World Ocean Database 2005 (Boyer et al., 2006)). Monthly values are diagnosed, defining hysteresis loops characterizing the seasonal cycle of the near-surface thermal properties.

5/ Sea ice concentrations (area of sea ice per grid cell area) compared to satellite sea ice concentration climatology for the years 1979-2004 compiled by Comiso (1999, updated 2005). Temporal evolution of annual mean sea ice area in both Hemispheres and maps of maximum sea ice area (March and September for the Northern and Southern Hemispheres, respectively).

6/ Equatorial Pacific upper ocean temperature and zonal velocity evaluation. Comparison to isopycnal analysis by Johnson et al. (2002).

7/ Mixed layer depths. For a global estimate of mixed layer depth, an observed estimate is used that is derived from a long-term monthly mean climatology of <theta - S> from the World Ocean Atlas (Locarini et al., 2006, Antonov et al., 2006).

8/ Global zonal average potential temperature and salinity anomalies in latitute and depth plane, compared to observations (World Ocean Database 2005 (Boyer et al., 2006), PHC Global Ocean Hydrography (Steele et al., 2001) for the Arctic), as a means to evaluate water mass formation processes. England and Maier-Reimer (2001) show that CFC and radiocarbon can also be used in this context.

9/ Vertically integrated volume transport through the Drake Passage. The Drake Passage transport has been measured using various methods, with a low value of around 100Sv (Orsi et al., 1995) and a high value of 135Sv (Cunningham et al., 2003), and a range of 134+/-13Sv (Whitworth, 1983; Whitworth and Peterson, 1985).

10/ Poleward heat transport. There are ambiguities as to the choice of which atmospheric product should be used to calculate the implied heat transport (e.g. see Taylor, 2000). See the discussion in Griffies et al. (2009) on two different approaches to calculate the implied heat transport.

11/ Meridional overturning streamfunction. The North Atlantic thermohaline circulation is examined in the Atlantic MOC streamfunction, while the Southern Ocean thermohaline circulation is examined in the global streamfunction.

 Top of page

Workshop on Ocean Model Metrics: Recommendations on evaluating ocean models for climate studies

Workshop organised by L. Thompson (University of Washington) and J. McClean (Scripps Institution of Oceanography) on 25th February 2006 in Hawaii, USA. The notes below are a summary of points raised in the workshop summary report, available in full here.

1/ Metrics that are used to evaluate the mean state and errors related to large scale biases include:

- Volume transport through key inter-basin exchanges (e.g. Drake Passage, Indonesian Throughflow, Florida Straits)
- Mean SST and SSS
- Meridional heat transport
- Seasonal sea ice extent

It should be noted that gridded observational datasets, such as the World Ocean Atlas Temperature and Salinity Climatology, that are commonly used for this analysis have been interpolated so that there is a value at every space-time grid point. This leads to problmems such as density being statistically unstable at the surface of about 1/3 of the world's oceans. In the Tropics, the equatorial thermocline is over stratified due to meridional averaging of the data (Large and Danabasoglu, 2006).

Errors can be present in sparsely sampled transport estimates. The best transport estimate of the Gulf Stream is from the cable data collected between Florida and the Bahamas. The best record of the Indonesian Throughflow is from the WOCE IX1-XBT line of the top 800m between Australia and Java that has been sampled for over two decades (Meyers et al., 1996). Meridonal heat transport estimates are generally constructed from one-time sections and so may only be representative of a particular climate regime that was present during sampling.

2/ Evaluating the model response to changes in atmospheric forcing.

Compared to observations, this can demonstrate the model performance in terms of upper ocean dynamics, though model error and forcing error can have compensating effects. Since the use of bulk formulae for turbulent heat fluxes can help constrain errors in SST, upper ocean heat content and mixed layer depth can be a better test of model performance than SST.

The relationship between SST, heat content, heat flux and wind stress can be compared with that found in observations for western boundary current regions and the tropical current systems. The correlation between heat content and sea surface height can be compared with that derived from observations, provided they come from comparable space-time scales.

3/ Calculating heat content from observations.

Profiling float data only samples the upper ocean, records are not long enough to evaluate climate scale variability, and XBT samples are biased to particlar regions and ship tracks. Altimetry can provide estimates of ocean volume, though this reflects changes in both heat and salinity, rather than heat content directly. Combining altimetric and profiling float and XBT data is optimal. Acoustic thermometry can also be used to obtain large-scale heat content.

4/ Wind stress products.

There are multiple wind stress products available with different temporal and spatial sampling characteristics. See Gille et al. (2005) for a comparison of different gridded wind products. For example, QuickSCAT winds are low in energy compared to other gridded scatterometer products. Extreme events can be captured by blended products that merge the high wavenumber information available from observations with high-frequency numerical weather prediction fields. Reanalyis wind products are often used to force models though care should be taken with heat fluxes calculated directly with models and differences in the products should be assessed (e.g. NCEP2 winds are weaker and better than NCEP1).

5/ Evaluating coastal or equatorial upwelling.

Low spatial resolution is a leading cause for poorly resolving upwelling. Defining upwelling in an ocean model in a consistent way is problematic as is defining an obsevationally based index that can be used as a metric to evaluate the model, identifying whether model error is due to the wind forcing or model physics. The Bakun index (Bakun, 1973) defines upwelling based on longshore winds only and can be noisy. Scatterometer data can be used as an additional wind field. Defining an upwelling index based on SST would be useful since upwelling has an impact on heat fluxes.

6/ Evaluating eddy permitting and eddy resolving frameworks.

Tests include eddy variability, decorrelation time scales and other eddy characteristics. The model can be evaluated against hydrographic sections, the global drifting buoy dataset and sea surface height (SSH) derived from altimetry. Eddy statistics should be calculated from geostrophic velocity calculated from along-track SSH rather than an optimally interpolated product. The high level of inherent variablity in repeat and one-time hydrographic sections can yield useful information with carefull co-located extractions in time and space. Hydrography sections can be compared more directly with non-eddy resolving ocean simulations once the hydrography has been smoothed to the scales appropriate for the model.

7/ Comparing multiple models.

A comparative measure of a particular quantity from two or more models can be made using a Taylor diagram (Taylor, 2001) that requires a correlation and standard deviation and this technique should be integrated into mainstream ocean model analysis.

 Top of page

Evaluation of UK Met Office HadGEM3

- How the metrics proposed by the UK Met Office Hadley Centre compare with the metrics GSOP is considering

- EN3/Levitus is used to evaluate temperature and salinity and de Boyer Montégut (2004) is used for the mixed layer depth. The sea ice is curently assessed with HadISST for concentrations, and Fowler is used for ice velocities.

 Top of page

Evaluation of NCAR CCSM3

- An example of the evaluation of CCSM3 ocean model output is given here. Surface fluxes are compared to Large and Yeager (2004) and temperature and salinity are compared to the Levitus / PHC2 data set for the standard, initial evaluation of model solutions. The WOCE climatology is also used to evaluate temperature and salinity distributions. If the model solutions include CFC11 and CFC12 data sets, then their distributions, penetration depths, and global inventories are separately compared to the GLODAP data set. For hindcast simulations, model solutions are compared to WOCE sections for a particular year and month. CFC and WOCE section comparisons are planned to be included in the standard diagnostics package soon.

Other important metrics are the ACC transport at Drake Passage, Atlantic meridional overturning circulation, Indonesian Throughflow, and the meridional heat transport in the Atlantic.

The following references include some examples of these comparisons:

Yeager, S.G. and W.G. Large, 2004: Late-Winter Generation of Spiciness on Subducted Isopycnals, J. Phys. Oceanogr., 34, 1528-1547.

Gent, P.R., F.O.Bryan, G.Danabasoglu, K.Lindsay, D.Tsumune, M.W.Hecht, and S.C.Doney, 2006: Ocean chlorofluorocarbon and heat uptake during the 20th century in the CCSM3. J. Climate, 19, 2366-2381.

Doney, S.C., S.Yeager, G.Danabasoglu, W.G.Large, and J.C.McWilliams, 2007: Mechanisms governing interannual variability of upper ocean temperature in a global ocean hindcast simulation. J. Phys. Oceanogr., 37, 1918-1938.

Large, W.G., and G.Danabasoglu, 2006: Attribution and impacts of upper ocean biases in CCSM3. J. Climate, 19, 2325-2346. 

Top of page

Evaluation of the CCSM Ocean Model

Download the powerpoint presentation "Metrics for the CCSM Ocean Model" by S. Jayne and J. McClean, a talk given in December 2006.