Data Sets & ML Models
The data sets generated during the project from the real testbeds and system simulators, along with related research outputs such as ML models and accompanying documentation are being made available in the MLSysOps community on Zenodo.
Data sets
Ubiwhere Smart Lamppost Dataset
This dataset was collected by Ubiwhere from a smart lamp post installed in the company headquarters, in the city of Aveiro, Portugal. The smart lamp post is equipped with video and sound sensors (camera and microphone) and captures environmental and traffic-related data. [View dataset on Zenodo]
INRIA SHIELD Framework Dataset - 5G Jamming Attack Detection
This dataset contains physical layer (PHY) cellular network traces collected from an Android smartphone (OnePlus Nord 2T 5G) under 5G jamming attack scenarios. It serves as the official training and validation data for the SHIELD Framework. [View dataset on Zenodo]
INRIA I/Q Signal Dataset for RF Fingerprinting and Physical Layer Authentication
This dataset contains Raw I/Q (In-Phase/Quadrature) radio signal traces collected using a BladeRF AX4 Software Defined Radio (SDR) and GNU Radio. It serves as the official training and validation data for the PLA-AP project (Physical Layer Authentication), designed to evaluate machine learning approaches for identifying wireless devices based on their hardware impairments (RF fingerprints). [View dataset on Zenodo]
Augmenta Tractor-Drone Co-Robotics Dataset for Weed Detection
This dataset contains telemetry and computer vision metrics collected from a Smart Agriculture co-robotics system developed by Augmenta (acquired by CNH Industrial). The system consists of a tractor equipped with a “Field Analyzer” and an autonomous drone (UAV). The data was collected to train Machine Learning models (specifically XGBoost) to predict the should_fly event—a signal that triggers the drone to launch and assist the tractor when the tractor’s onboard cameras are blinded by environmental factors (e.g., sun glare/lens flare). [View dataset on Zenodo]
TUD Telemetry Dataset for Anomaly Detection
This dataset contains snapshots of host telemetry metrics collected during different workload conditions. It is intended for training and evaluating anomaly detection models (e.g., reconstruction-based autoencoders). [View dataset on Zenodo]
Chocolate Cloud Object Storage Transfer Speeds Dataset
This dataset measures upload and download performance between Fly.io gateway regions (origins) and commercial object storage backends (targets). Each row is one measurement for a specific data size, initiated from a Fly.io region and recorded against a particular backend, and is intended for studying network performance, latency-sensitive placement, and cross-region transfer behavior. [View dataset on Zenodo]
ML models
UCD Cluster VM Management Model
This repository contains a Deep Reinforcement Learning agent (trained using Maskable PPO) for optimizing Virtual Machine placement and lifecycle management. The model is exported as a platform-independent ONNX file for easy deployment. [View model on Zenodo]
INRIA 5G Jamming Attack Detection - LSTM Model
This model was developed by INRIA as part of the SHIELD framework: SHIELD is a research framework designed to evaluate machine-learning-based approaches for detecting jamming and interference in 5G networks under realistic conditions. [View model on Zenodo]
INRIA RF Fingerprinting Model Collection for Physical Layer Authentication
This repository contains a comprehensive collection of machine learning models developed by INRIA. These models implement Physical Layer Authentication (PLA) using RF Fingerprinting. They are designed to secure wireless networks by identifying devices based on the unique physical characteristics (fingerprints) of their radio hardware, rather than just their digital credentials. This record provides a “Model Zoo” covering various experimental scenarios (Trials), machine learning architectures, and feature selection techniques. [View models on Zenodo]
Chocolate Cloud/UCD SkyFlok Latency Prediction: Gradient Boosting Models
This repository contains a collection of machine learning models developed in collaboration between University College Dublin (UCD) and Chocolate Cloud (CC). These models are deployed within the SkyFlok Gateway component (hosted in London). They perform Latency Prediction to estimate the time required to retrieve a file from specific cloud storage backends. [View models on Zenodo]
Ubiwhere/UCD Smart Lamppost Noise Prediction LSTM Model
This repository contains a machine learning model developed in collaboration between University College Dublin (UCD) and Ubiwhere as part of the MLSysOps project. The model is deployed on edge devices within Smart Lampposts in Aveiro, Portugal. It performs Noise Level Prediction to estimate future environmental noise levels based on real-time traffic and pedestrian activity. [View model on Zenodo]
Augmenta/UCD Drone Deployment Prediction Model
This repository contains a machine learning model developed by University College Dublin (UCD) for Augmenta (acquired by CNH Industrial) as part of the MLSysOps project, focusing on drone deployment prediction. The model predicts the should_fly signal for drone operations, leveraging temporal sensor and flight data to anticipate deployment needs ahead of time. This enables proactive drone management, accounting for operational delays and improving decision-making in real-world scenarios. [View model on Zenodo]
UTH VM Utilization and Remaining Lifetime Predictor Model
This repository contains PeakLife, a lightweight neural model exported to ONNX for portable inference. Given the historic utilization information for a VM, PeakLife predicts (a) Future CPU utilization: AvgCPU and MaxCPU (normalized), and (b) Remaining lifetime: normalized remaining lifetime (and seconds via scaling). [View model on Zenodo]
NTT DATA 5G Latency Optimization RL Prediction Model
This repository contains a machine learning model developed by NTT DATA as part of the MLSysOps project. This artifact provides an ONNX export (opset 18) of a Deep Q-Network (DQN) agent trained to select the best data center (among 3) for a client request in a 5G/MEC setting, optimizing latency-related outcomes (and incorporating carbon intensity as a feature). Given the current state of three candidate data centers, the model outputs Q-values for each possible selection and chooses the data center with the highest Q-value. [View model on Zenodo]
TUD Anomaly Detection Model
This repository contains a trained Autoencoder-based anomaly detection model developed in the context of the MLSysOps project. The model performs unsupervised anomaly detection on node/VM telemetry metrics by learning to reconstruct normal observations. [View model on Zenodo]
