To evaluate the performances of the probabilistic forecasts provided by seasonal forecasting it is very common the use of skill scores. A **skill score** is normally defined as a ratio between a specific accuracy measure computed on the forecast and the same measure applied to a reference forecast. In the case of seasonal forecasts, as reference forecast is commonly used the climatological probability, that is the observed frequency of the target event (the event predicted by the forecast) in the past. Let’s say that I want to predict the probability to have a temperature for the incoming season higher 3 degrees than normal, in this case to assess the quality of my forecast I would use as reference forecast the frequency of having the seasonal temperature higher than 3 degrees that could be observed looking at the past 20-30-40 years (it depends on the availability of data). This means that I would compare my forecast with a static forecast, something like “*ok, the event happened the 10% of times in the last 30 years, so I assume that the probability for the incoming season is exactly that value, 10%*“.

This is not the only option, IRI in their Descriptions of the IRI Climate Forecast Verification Scores states:

Note that using climatology forecasts as the reference for comparison is not the only reasonable option. Other options may be random non-climatology forecasts, or damped persistence forecasts from the previous season’s observations

However, in the area of seasonal forecasting the skill scores are almost always evaluated using the climatology as reference forecast. Recently, I have published on the EarthArXiv repository a draft paper written with Carlo Buontempo titled “*What is users’ next best alternative to the use of dynamical seasonal predictions?*“. The paper could seem a bit controversial but we are not saying that the seasonal forecasts are not good (enough), we are instead convinced that the climatology (or even the persistence) “*does not necessarily represent a good proxy for the value the users may see in these predictions*“. We think this is really important from a climate service perspective, because:

the existence of simple alternative models with similar skill [of dynamical seasonal forecasts] could represent a stimulus for further research whilst at the same time providing a natural benchmark for evaluating more complex kind of predictions.

Moreover, although this is not mentioned in the paper, we should also take into account the computational requirements of the seasonal forecasts we provide, to this end when comparing the skill we might also give a look at the difference in computational time in order to provide a trade-off between the cost of a forecast and its quality.