Swiss Conference on Data Science 2022: MLOps in der Praxis

Blog-Swiss Conference on Data Science
Allgeier Blog Swiss Conference on Data Science 2022: MLOps in der Praxis

Between 22.06 and 23.06 the Swiss Conference on Data Science brings together practitioners, learners and the curious, to broaden their knowledge and their network within the field of analytics. The first day open its doors with a series of workshops from Managing the End-to-End Machine Learning Lifecycle to discovering how to Develop Fair Algorithms. The second day of the conference brings a multitude of seminars at the renowned KKL in Luzern, regarding the business implementation of machine learning models as well as the future trends of the entire domain.

The workshop case study

For Allgeier, the conference offers the opportunity to exchange ideas and keep its analytics team sharp with new business implementations. One of these implementations was a workshop regarding The Full Machine Learning Lifecycle. The workshop developed by Steffen Terhaar, Tim Rohner, Bernhard Venneman, Spyros Cavadias and Roman Moser from the consulting firm D ONE, dives into a Machine Learning (ML) case study, that depicts and practically implements a machine learning operations (MLOps) pipeline from scoping to deployment using open-source tools. Essentially showcasing how the DevOps principles from software engineering translate to data science and machine learning.

To give you a brief overview of what this entails: MLOps is a Machine Learning (ML) engineering practice that aims to unify ML system development (Dev) and ML system operation (Ops). It serves as a counterpart to the DevOps practice in classical software development which involves Continuous Integration (CI) and Continuous Deployment (CD). Practicing MLOps advocates automation and monitoring at all steps of the ML system construction, including integration, testing, releasing, deployment, and infrastructure management.

The The Full Machine Learning Lifecycle workshop brought together approximately 20 data scientist and analyst from all over Europe. The journey starts at the Great Hall of the Metropol Hotel in Zurich, where everyone takes their seats. As the beamer powers on and the D ONE team take center stage, the lecturers explain that the workshop case study focuses on building and productionalizing a machine learning model to predict turbine malfunctions from a dataset produced by Winji.

To understand why this dataset is so interesting, here is a brief overview of what Winji does: the Zurich-based data-driven company offers a platform that provides AI-based insights from alternative energy assembly components and environmental factors, that provide wind and solar farms with useful recommendations and accurate forecasting to optimize the resources overall output.

It should be noted that the workshop does not put any focus on what sort of ML algorithm is being implemented. In the end, this is not relevant. From supervised regression to unsupervised classification models, every application is different. The focus is always on the entirety of the pipeline and its respective steps within the MLOps life cycle.

As the lecturers conclude their explanations and the participants gain a good overview of the provided data, three important facts come to light:

MLOps Popularity
  • The search popularity of Machine Learning is slowly but surely being taken over by MLOps, with the crossroad of the two search terms being pinpointed around late 2021.
  • From a business perspective, developing a machine learning model is only a small piece of the puzzle when it comes to practically implementing it. The whole being MLOps.
  • The number of useful open-source tools for MLOps is massive.

 

Since the number of open-source tools is so broad, the lecturers define the open-source stack for the participants in the following manner:

The entire case study is performed using Python within Visual Studio Code. Since the workshop is limited to five hours, a Virtual Machine (VM) with predefined parameters is set up with all the necessary dependencies.

Within the VM, data exploration is done using Jupyter Notebooks. The Data is versioned and kept track off using Data Version Control (DVC). The modelling algorithms come from scikit-learn. The tracking of the models is done using MLflow. The entire orchestration i.e., the pipelines are built using AirFlow and the deployment of the entire operation is made possible using Docker. The beauty of all of this? It’s completely free of course.

The lecturers make it clear, that certain processes could be covered by the same tool. As an example: MLflow does not only track models, but it can also be used to orchestrate pipelines and deploy operations. However, for the sake of tool strength, every step has its own tool. The next objective is implementing everything we have learned on our own, hands-on.

As the participants take-on the use case through predefined exercises, each step is carefully discussed and implemented. The dos and don’ts of every tool are carefully outlined as well as how they complement each other. Each step of the pipeline and its importance is showcased in detail. The workshop is guided, but a certain amount of independent thinking is necessary to move along every subject.

In summary, the workshop provided the participants with a great overview of how to orchestrate as well as troubleshoot an entire ML pipeline into production using an interesting real-word case with open-source tools.

 

MLOps_Pipeline

 

 

Potential business implementation

Where could this all be implemented in practice? First, it should be noted that the workshop uses open-source tools that have a heavy community that consistently update and therefore better its tools. The keyword being open source. There are plenty of startups and smaller businesses that lack the necessary financial resources to set up legacy services provided by Microsoft, Google, or Amazon. To counteract this, open-source tools provide a great alternative to building ML pipelines free of charge. The downside is that setting up these pipelines and then maintaining them, requires more time and depending on the project, might prove problematic in the long run when it comes to scalability. But as a PoC, to verify the potential impact of ML within a business, implementing the steps provided by this workshop can be the steppingstone a business requires to realize the true potential of ML applications.

Sie haben Fragen oder wünschen weitere Infos dazu?

Schreiben Sie uns!

Weitere Artikel

Interview mit Christian Dunkel

Microsofts Zero Trust Assessment Tool im Praxistest 

Infrastruktur-Experte Christian Dunkel erklärt im Interview, was das neue Security Tool von Microsoft kann und wie Unternehmen damit Schwachstellen aufdecken und wirksame Massnahmen für mehr Sicherheit einleiten können.

Reporting neu gedacht im Hochbauamt des Kanton Zürich

Das Hochbauamt des Kanton Zürich modernisiert sein Reporting mithilfe von Power BI. Eine optimierte Infrastruktur sowie eine konsistente Datenbasis sorgen dabei für mehr Effizienz und Transparenz.

Sebastian Bloch wird neuer CEO bei Allgeier (Schweiz) AG

Sebastian Bloch übernimmt ab 1. Oktober 2025 die Rolle des CEO bei Allgeier (Schweiz) AG. Alex Plasa bleibt bis März 2026 in der Geschäftsleitung und sichert eine nahtlose Übergabe.

Mit Daten zur grünen Lieferkette

Erfahren Sie, wie Handel und Logistik ESG-Daten nutzen, um Berichtspflichten zu erfüllen, Kosten zu senken und Emissionen messbar zu reduzieren.
Datengetriebene Nachhaltigkeit in der Industrie: Optimierung von Produktionsprozessen und Reduzierung von Abfällen

Datengetriebene Nachhaltigkeit in der Industrie

Entdecken Sie, wie Industrieunternehmen ESG-Daten nutzen, um Prozesse zu optimieren, Abfälle zu reduzieren und regulatorische Anforderungen zu erfüllen.

Digitale Emissions-Transparenz für Immobilien

Erfahren Sie, wie Sinovis und Allgeier durch digitale Lösungen Energien und Emissionen von Immobilien bewert- und vergleichbar machen. Nachhaltigkeit im Fokus.

Vom Reporting zur Wirkung

So machen Energieversorger und Verwaltungen aus ESG-Reporting ein Steuerungsinstrument für Klimaziele, Investitionen und mehr Wirkung pro Budget-Franken.

Netto-Null für Banken und Versicherungen

Erfahren Sie, wie Banken & Versicherungen ESG-Daten nutzen, um Risiken zu minimieren, Berichtspflichten zu erfüllen und Vertrauen bei Stakeholdern zu stärken.