1. Development of a Data Processing Pipeline
This custom-built data processing pipeline was created to combine historical, static and live data into valuable output during testruns and races.
A first process captures live test data to make it available for historical and live analysis further down the pipeline.
One of the other pipeline building blocks is a data cleaning process in which (live and historical) testrun data is cleaned and restructured into a unified data format that allows easy data processing.
Another process joins the different datastreams with static data (aerodynamic wind tunnel tests and CAD simulations) in order to calculate KPIs on the fly.
Besides a unified data processing pipeline, Klarrio wants to assure that future car sensors and/or different live data sources are easily coupled to the new data pipeline.
2. Future oriented approach
On the ingest layer of the data processing pipeline, Klarrio implemented an MQTT broker such that all live sensor data is residing on that messaging layer. In the future, new additional sensors can now be coupled to the pipeline by publishing their data onto a specific MQTT topic.
Klarrio believes that a future oriented data approach does not cease with data engineering; also upcoming data science aspirations are planned. Consequently, to develop new analytical models beyond the current scope, new sets of data need to be collected. Therefore custom data scrapers were implemented to collect different sets of weather data: weather forecasts and satellite imagery.
3. Data science facilitating tools
The data processing pipeline is the backbone providing insights into different race KPIs to determine an optimal race strategy. Defining this optimal race strategy is based on an iterative process. To facilitate this iterative process a replay functionality was added as a service to the data processing pipeline. Furthermore, this replay functionality is helpful when the race strategist wants to train with the provided tools to prepare for upcoming races. The race strategies are based on machine learning models. Therefore this pipeline has a functionality to easily swap (new) trained models into production.
To visualize different race KPIs and play with (new) trained AI models, an interactive dashboard was created.