The AirfRANS dataset is a collection of numerical simulations solving the incompressible Reynolds-Averaged Navier-Stokes equations over two dimensional airfoils in a subsonic flight regime. The associated paper has been accepted at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks. In addition to this library two GitHub repositories have been proposed to reconduct the paper experiments and to generate new compressible or incompressible simulations over NACA airfoils. The setup to generate those simulations have been confronted to the Langley Research Center experiments available in the Turbulence Modeling Resource for the NACA 0012 and 4412.

This dataset has been built to lower the potential barrier between Machine Learning and Physics research communities. It proposes data on a simple but realistic case which already includes some of the major challenges of Machine Learning for solving Fluid Dynamics, namely:

  • working with unstructured data coming from raw numerical simulations,

  • being able to deal with the number of nodes required in simulation meshes (from hundreds of thousands to hundreds of million in 3D cases),

  • treating cases with a realistic Reynolds number,

  • regressing the entire velocity, pressure and turbulent viscosity fields from a geometry and the boundary conditions,

  • being accurate on global forces or coefficient such as drag and lift,

  • being consistent between the predicted fields and the predicted forces,

  • regressing accurately boundary layers and area of simulations where sharp signals appear,

  • producing solutions that respect the conservation equation and the momentum equations.

We hope that this library will ease the manipulation of such simulations and the usage of the AirfRANS dataset.

Raw OpenFOAM data

The dataset comes under different form. A pre-processed version of cropped simulations and including only the minimum number of fields is proposed as a work basis. A full raw OpenFOAM data version is also available but its manipulation necessitates some basic knowledge of how OpenFOAM works. However, raw data include more information than the processed data and especially if you are interested in the gradient of the fields. Each raw data contains each term of the momentum and conservation equations as fields that could be used, for example, to compare the gradients of the approximation with the gradients of the simulation.


We used OpenFOAM v2112 to generate our simulation. The manipulation and visualization of the results can be done with ParaView and/or with a pythonic interface such as PyVista. Finally, the treatment of those data in a deep learning point of view can be done with Geometric Deep Learning library such as PyTorch Geometric or Deep Graph Library. As those tools and domains are not necessarily well known in the Machine Learning community, we would like to share some tutorials and books that helped us to be more comfortable with the subject:

  • One of the OpenFOAM wiki is a must for learning this powerful tool. For just a taste, you can follow the First Glimpse Series and for a more in-depth introduction, the Three Weeks Series.

  • Concerning ParaView, a part of the Three Weeks Series is dedicated to it but can be followed independently. You can find it here.

  • This book proposes an overview of the mathematics in OpenFOAM.

  • The Turbulence Modeling for CFD book for understanding how to model turbulence in CFD.

  • The Fundamentals of Aerodynamics book for an aerodynamics centered presentation of fluid dynamics.

  • More fundamentaly, the Fluid Mechanics book for a general introduction to fluid mechanics.


This dataset is under the Open Data Commons Open Database License (ODbL) v1.0 and this library is under the MIT License.

This work is proposed by Extrality and the MLIA team of Sorbonne Université, Paris.