A Python ecosystem for Climate Science

2019-10-21 2 min read

I have recently shared a Python notebook using Google Colab (and GitHub). The code implements an entire workflow for climate science starting from data retrieving (seasonal forecasts and reanalysis) to the calculation of performance metrics (deterministic and probabilistic). I remember when years ago, when I started working on seasonal forecasts, the same workflow was much longer and complicate: I was retrieving data from the ECMWF ECGATE cluster (based on AIX), postprocessing them with GRIB tools and CDO and then analysing with MATLAB using metrics developed by me and my colleagues. Now the same workflow can be implemented: 1) using open source tools and 2) in a reproducible way with minimal effort.

We can easily say that today we have a Python-based ecosystem for climate data analysis which includes:

xarray for multi-dimensional data manipulation
dask for out-of-core computation (look here for an example)
cfgrib to access GRIB data with xarray
numpy for scientific computing
cdsapi to retrieve data from the Copernicus Data Store
eofs for EOF analysis
xskillscore for forecast verifification scores
matplotlib and cartopy for mapping and data visualisation

This list is not exhaustive and it is based on my personal experience. If you think that something is missing feel free to send me a message or a Tweet. You can also give a look at this page for an extended discussion on a Python stack for Atmospheric and Ocean Science and to pangeo for a Python ecosystem for Big Data in geosciences.

And the reproducibility? Part of the reproducibility derives from the openness of software (and data, as for the Copernicus Data Store) but also is implemented by amazing tools like Jupyter and services like GitHub/Gitlab, Binder and Google Colab. Reproducibility is definitely a fundamental topic in science and building reproducible workflows should be a priority to have a more transparent and fairer, in other words, a better science.

Python

A Python ecosystem for Climate Science

Lead Data Science Services

Related