The Zarr factor: why Zarr is the new language of climate intelligence

(This article has been published on LinkedIn)

Photo by Jenna T on Unsplash

Zarr is not just a file format. Traditionally, climate/weather data is stored in large files, designed for massive storages and supercomputers: requiring a large data engineering overhead and making sometimes proper governance (legacy/lineage) difficult. Zarr instead, breaks datasets into small, compressed pieces (chunks). To extract 80 years of data on a specific tiny region for example for a risk report, this architecture allows for surgical data retrieval: researchers can extract the specific “cube” of data they need.

For a large organisation like a global bank, this allows to:

  1. Reduce redundancy and shadow data: no more local copies of data, with higher costs and possibly version mismatch. This makes easier to maintain a centralised Source of Truth.
  2. Higher Productivity (no more data engineering tax): less time spent “wrangling” data (downloading, converting, cleaning). In the era of ARCO (Analysis-Ready Cloud-Optimised) datasets, researchers can spend their time creating insights without large teams of data engineers troubleshooting file formats and managing storages.
  3. Lower Barrier: a notebook with Python/Xarray running in Fabric/Databricks or even on your laptop is enough to perform climate analysis, especially when you have a local or external marketplace handling all the rest (backend, metadata).

Copernicus ECMWF created the first successful climate data store, the large cloud providers (Google, AWS, Azure) proved the value (and scalability) of ARCO data, Destination Earth (DestinE) is scaling that access to European level. Now, platforms like Earthmover are taking the final, crucial step: making it simple. By providing a streamlined marketplace, they are turning a complex scientific challenge into a plug-and-play utility for the global economy.

Lead Data Science Services

Related