Join us for the live launch of the MLSysOps open-source framework!

đź—“ Date: Fri, Jul 18th, 2025
đź•’ Time: 13:00 CET
📍 Where: Online (Zoom)

In this hands-on session, we will:

– Introduce the MLSysOps framework and why we built it

– Set up a testbed from scratch using our provided scripts

– Deploy the system and run a real-world example

– Showcase our policy API – and give a sneak peek at how ML blends in

– Walk you through what’s next and how to get involved

Whether you’re running workloads at the edge, in the cloud, or anywhere in between, MLSysOps brings smart automation and control where you need it.

Register now: https://forms.gle/VzbEykjU1XnT8Rys8
Project repo: https://github.com/mlsysops-eu/mlsysops-framework

The MLSysOps Framework

The MLSysOps Framework, developed as the open-source outcome of the MLSysOps EU Project, provides an extensible and modular platform (licensed under Apache-2.0) for autonomic, explainable, and adaptive system management across the heterogeneous computing continuum—from centralized cloud infrastructures to resource-constrained far-edge devices.

Designed to reduce human monitoring and manual configuration, the framework leverages AI-powered decision-making and machine learning models to dynamically optimize resource utilization, application deployment, and system performance in real time.

Architecture & Core Principles

MLSysOps Framework

The design of the framework is grounded in a hierarchical, agent-based architecture driven by the MAPE-K loop (Monitor, Analyze, Plan, Execute – Knowledge) paradigm. It introduces the concept of a system slice—a logical grouping of computing, storage, and networking resources across the continuum—managed as a self-contained unit through an instance of the MLSysOps control plane.

Key architectural components include:

  • Node Agents for edge and far-edge control aspects
  • Cluster Agents to manage resource domains comprising multiple nodes
  • A global Continuum Agent to coordinate cross-domain decisions within a given system slice
  • Flexible and dynamic ML model integration, enabling the addition and invocation of explainable, continually trained, and modular intelligence
  • Northbound/Southbound interfaces for the interaction with external clients and interoperability with external systems and orchestrators (e.g., Kubernetes)
  • A Command Line Interface (CLI) that connects to the Northbound API of a given system slice and supports the basic interactions through which one can deploy and monitor the execution of applications

The framework operates as an abstraction middleware, managing diverse infrastructure layers and supporting data-driven autonomy.

Key Features

  • Kubernetes-native Deployment using Custom Resource Definitions (CRDs)
  • Multi-cluster orchestration powered by Karmada
  • Dynamic telemetry and observability layer, easily extendable
  • Plugin systems to support configuration policies and custom mechanisms
  • Northbound REST API service for external applications or CLI use
  • ML Connector Service for seamless model deployment, retraining, and explainability integration
  • Support for far-edge devices and multiple container runtimes
  • System inventory and application targeting via CRDs
  • Built-in storage and resource management across the continuum

Use Cases

  • Autonomic application deployment with ML-driven optimization
  • Adaptive system configuration based on workload and context-aware policies
  • Policy-based orchestration to meet application and system-level SLAs
  • Explainable AI integration for transparent system decisions
  • Resource-aware deployment for IoT and edge environments with constrained hardware

Explore the Code / Get Started

Skip to content