The MLSysOps Framework
The MLSysOps Framework, developed as the open-source outcome of the MLSysOps EU Project, provides an extensible and modular platform (licensed under Apache-2.0) for autonomic, explainable, and adaptive system management across the heterogeneous computing continuum—from centralized cloud infrastructures to resource-constrained far-edge devices.
Designed to reduce human monitoring and manual configuration, the framework leverages AI-powered decision-making and machine learning models to dynamically optimize resource utilization, application deployment, and system performance in real time.
Architecture & Core Principles

The design of the framework is grounded in a hierarchical, agent-based architecture driven by the MAPE-K loop (Monitor, Analyze, Plan, Execute – Knowledge) paradigm. It introduces the concept of a system slice—a logical grouping of computing, storage, and networking resources across the continuum—managed as a self-contained unit through an instance of the MLSysOps control plane.
Key architectural components include:
- Node Agents for edge and far-edge control aspects
- Cluster Agents to manage resource domains comprising multiple nodes
- A global Continuum Agent to coordinate cross-domain decisions within a given system slice
- Flexible and dynamic ML model integration, enabling the addition and invocation of explainable, continually trained, and modular intelligence
- Northbound/Southbound interfaces for the interaction with external clients and interoperability with external systems and orchestrators (e.g., Kubernetes)
- A Command Line Interface (CLI) that connects to the Northbound API of a given system slice and supports the basic interactions through which one can deploy and monitor the execution of applications
The framework operates as an abstraction middleware, managing diverse infrastructure layers and supporting data-driven autonomy.
Key Features
- Kubernetes-native Deployment using Custom Resource Definitions (CRDs)
- Multi-cluster orchestration powered by Karmada
- Dynamic telemetry and observability layer, easily extendable
- Plugin systems to support configuration policies and custom mechanisms
- Northbound REST API service for external applications or CLI use
- ML Connector Service for seamless model deployment, retraining, and explainability integration
- Support for far-edge devices and multiple container runtimes
- System inventory and application targeting via CRDs
- Built-in storage and resource management across the continuum
Use Cases
- Autonomic application deployment with ML-driven optimization
- Adaptive system configuration based on workload and context-aware policies
- Policy-based orchestration to meet application and system-level SLAs
- Explainable AI integration for transparent system decisions
- Resource-aware deployment for IoT and edge environments with constrained hardware
Explore the Code / Get Started
GitHub Repository: github.com/mlsysops-eu/mlsysops-framework