DATA SETS & ML MODELS

The data sets generated during the project from the real testbeds and system simulators, along with related research outputs such as ML models and accompanying documentation are being made available in the MLSysOps community on Zenodo.

Data sets

Ubiwhere Smart Lamppost Dataset

This dataset was collected by Ubiwhere from a smart lamp post installed in the company headquarters, in the city of Aveiro, Portugal. The smart lamp post is equipped with video and sound sensors (camera and microphone) and captures environmental and traffic-related data. [View dataset on Zenodo]

INRIA SHIELD Framework Dataset - 5G Jamming Attack Detection

This dataset contains physical layer (PHY) cellular network traces collected from an Android smartphone (OnePlus Nord 2T 5G) under 5G jamming attack scenarios. It serves as the official training and validation data for the SHIELD Framework. [View dataset on Zenodo]

INRIA I/Q Signal Dataset for RF Fingerprinting and Physical Layer Authentication

This dataset contains Raw I/Q (In-Phase/Quadrature) radio signal traces collected using a BladeRF AX4 Software Defined Radio (SDR) and GNU Radio. It serves as the official training and validation data for the PLA-AP project (Physical Layer Authentication), designed to evaluate machine learning approaches for identifying wireless devices based on their hardware impairments (RF fingerprints). [View dataset on Zenodo]

Augmenta Tractor-Drone Co-Robotics Dataset for Weed Detection

This dataset contains telemetry and computer vision metrics collected from a Smart Agriculture co-robotics system developed by Augmenta (acquired by CNH Industrial). The system consists of a tractor equipped with a “Field Analyzer” and an autonomous drone (UAV). The data was collected to train Machine Learning models (specifically XGBoost) to predict the should_fly event—a signal that triggers the drone to launch and assist the tractor when the tractor’s onboard cameras are blinded by environmental factors (e.g., sun glare/lens flare). [View dataset on Zenodo]

TUD Telemetry Dataset for Anomaly Detection

This dataset contains snapshots of host telemetry metrics collected during different workload conditions. It is intended for training and evaluating anomaly detection models (e.g., reconstruction-based autoencoders). [View dataset on Zenodo]

Chocolate Cloud Object Storage Transfer Speeds Dataset

This dataset measures upload and download performance between Fly.io gateway regions (origins) and commercial object storage backends (targets). Each row is one measurement for a specific data size, initiated from a Fly.io region and recorded against a particular backend, and is intended for studying network performance, latency-sensitive placement, and cross-region transfer behavior. [View dataset on Zenodo]

NVIDIA/UTH Job Placement Failure Dataset for Simulated Datacenter Clusters with Reconfigurable Optical Networks

This dataset contains cluster-level snapshots and job placement outcomes generated using a simulated large-scale datacenter environment. The data is intended for training and evaluating machine learning models that predict whether a job submission will succeed or fail given the current cluster state and job resource request. [View dataset on Zenodo]

UTH FPGA Telemetry Dataset for ML Inference Experiments on AMD/Xilinx ZCU102 MPSoC Development Board

This dataset contains telemetry traces from repeated machine learning inference experiments executed on a Xilinx ZCU102 FPGA platform. Each experiment corresponds to a specific DPU bitstream configuration (DPU size and number of DPU compute units), a model variant (including pruning variants), and a system workload mode applied on the ARM CPU. [View dataset on Zenodo]

ML models

UCD Cluster VM Management Model

This repository contains a Deep Reinforcement Learning agent (trained using Maskable PPO) for optimizing Virtual Machine placement and lifecycle management. The model is exported as a platform-independent ONNX file for easy deployment. [View model on Zenodo]

INRIA 5G Jamming Attack Detection - LSTM Model

This model was developed by INRIA as part of the SHIELD framework: SHIELD is a research framework designed to evaluate machine-learning-based approaches for detecting jamming and interference in 5G networks under realistic conditions. [View model on Zenodo]

INRIA RF Fingerprinting Model Collection for Physical Layer Authentication

This repository contains a comprehensive collection of machine learning models developed by INRIA. These models implement Physical Layer Authentication (PLA) using RF Fingerprinting. They are designed to secure wireless networks by identifying devices based on the unique physical characteristics (fingerprints) of their radio hardware, rather than just their digital credentials. This record provides a “Model Zoo” covering various experimental scenarios (Trials), machine learning architectures, and feature selection techniques. [View models on Zenodo]

Chocolate Cloud/UCD SkyFlok Latency Prediction: Gradient Boosting Models

This repository contains a collection of machine learning models developed in collaboration between University College Dublin (UCD) and Chocolate Cloud (CC). These models are deployed within the SkyFlok Gateway component (hosted in London). They perform Latency Prediction to estimate the time required to retrieve a file from specific cloud storage backends. [View models on Zenodo]

Ubiwhere/UCD Smart Lamppost Noise Prediction LSTM Model

This repository contains a machine learning model developed in collaboration between University College Dublin (UCD) and Ubiwhere as part of the MLSysOps project. The model is deployed on edge devices within Smart Lampposts in Aveiro, Portugal. It performs Noise Level Prediction to estimate future environmental noise levels based on real-time traffic and pedestrian activity. [View model on Zenodo]

Augmenta/UCD Drone Deployment Prediction Model

This repository contains a machine learning model developed by University College Dublin (UCD) for Augmenta (acquired by CNH Industrial) as part of the MLSysOps project, focusing on drone deployment prediction. The model predicts the should_fly signal for drone operations, leveraging temporal sensor and flight data to anticipate deployment needs ahead of time. This enables proactive drone management, accounting for operational delays and improving decision-making in real-world scenarios. [View model on Zenodo]

NTT DATA 5G Latency Optimization RL Prediction Model

This repository contains a machine learning model developed by NTT DATA as part of the MLSysOps project. This artifact provides an ONNX export (opset 18) of a Deep Q-Network (DQN) agent trained to select the best data center (among 3) for a client request in a 5G/MEC setting, optimizing latency-related outcomes (and incorporating carbon intensity as a feature). Given the current state of three candidate data centers, the model outputs Q-values for each possible selection and chooses the data center with the highest Q-value. [View model on Zenodo]

TUD Anomaly Detection Model

This repository contains a trained Autoencoder-based anomaly detection model developed in the context of the MLSysOps project. The model performs unsupervised anomaly detection on node/VM telemetry metrics by learning to reconstruct normal observations. [View model on Zenodo]

UTH VM Utilization and Remaining Lifetime Predictor Model

This repository contains PeakLife, a lightweight neural model exported to ONNX for portable inference. Given the historic utilization information for a VM, PeakLife predicts (a) Future CPU utilization: AvgCPU and MaxCPU (normalized), and (b) Remaining lifetime: normalized remaining lifetime (and seconds via scaling). [View model on Zenodo]

NVIDIA/UTH ML Model for Predicting Job Placement Failures in Datacenter Clusters

This repository contains a trained binary classification model, exported to ONNX, that predicts whether a submitted job will fail or run successfully, given the current state of a simulated datacenter cluster, and the resource request of an incoming job. [View model on Zenodo]

UTH Reinforcement Learning Policy Model for Dynamic FPGA DPU Configuration Selection

This repository provides a trained reinforcement-learning policy model, exported to ONNX, that selects an FPGA DPU configuration—defined by DPU size and number of DPU compute units (instances)—given an observation vector describing the current system and job context. [View model on Zenodo]

UTH Keep-alive Time Prediction Model for Serverless Function Enclaves

This repository contains the artifact for LACE-RL, a reinforcement-learning framework that adaptively selects keep-alive durations to co-optimize cold-start latency and carbon emissions under time-varying grid carbon intensity. It accompanies the paper “Green or Fast? Learning to Balance Cold Starts and Idle Carbon in Serverless Computing” accepted at CCGrid 2026. [View artifact on Zenodo]