Data processing

Alphabetical list of available ULHPC software belonging to the 'data' category. To load a software of this category, use: module load data/<software>[/<version>]

Software Versions Swsets Architectures Clusters Description
Arrow 0.16.0 2019b broadwell, skylake iris Apache Arrow (incl. PyArrow Python bindings)), a cross-language development platform for in-memory data.
DB_File 1.855 2020b broadwell, epyc, skylake aion, iris Perl5 access to Berkeley DB version 1.x.
GDAL 3.0.2, 3.2.1 2019b, 2020b broadwell, skylake, gpu, epyc iris, aion GDAL is a translator library for raster geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation. As a library, it presents a single abstract data model to the calling application for all supported formats. It also comes with a variety of useful commandline utilities for data translation and processing.
HDF5 1.10.5, 1.10.7 2019b, 2020b broadwell, skylake, gpu, epyc iris, aion HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data.
HDF 4.2.15 2020b broadwell, epyc, skylake, gpu aion, iris HDF (also known as HDF4) is a library and multi-object file format for storing and managing data between machines.
LAME 3.100 2019b, 2020b broadwell, skylake, gpu, epyc iris, aion LAME is a high quality MPEG Audio Layer III (MP3) encoder licensed under the LGPL.
XML-LibXML 2.0201, 2.0206 2019b, 2020b broadwell, skylake, epyc iris, aion Perl binding for libxml2
dask 2021.2.0 2020b broadwell, epyc, skylake, gpu aion, iris Dask natively scales Python. Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love.
h5py 2.10.0, 3.1.0 2019b, 2020b broadwell, skylake, gpu, epyc iris, aion HDF5 for Python (h5py) is a general-purpose Python interface to the Hierarchical Data Format library, version 5. HDF5 is a versatile, mature scientific software library designed for the fast, flexible storage of enormous amounts of data.
netCDF-Fortran 4.5.2, 4.5.3 2019b, 2020b broadwell, skylake, epyc iris, aion NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.
netCDF 4.7.1, 4.7.4 2019b, 2020b broadwell, skylake, gpu, epyc iris, aion NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.
scikit-learn 0.23.2 2020b broadwell, epyc, skylake, gpu aion, iris Scikit-learn integrates machine learning algorithms in the tightly-knit scientific Python world, building upon numpy, scipy, and matplotlib. As a machine-learning module, it provides versatile tools for data mining and analysis in any field of science and engineering. It strives to be simple and efficient, accessible to everybody, and reusable in various contexts.

Last update: March 4, 2024