Endurance testing during prototyping is always a challenge, especially when involving drive units running for over 30 days continuously. One of our clients faced the issue of resiliency: how could they ensure they didn't lose a single data point in case of an unexpected event like a crash or even a fire. More than that, how could they keep an eye on the test progress remotely and make sure that everyone that could detect an issue with the prototype could see the data without hassle. This could save weeks of development time and increase the tempo of the R&D.
Got a testbench and want to ensure every log is safe and accessible from anywhere? Contact us today to implement a monitoring solution tailor-made to your needs.
The testbench: a vital tool to validate R&D concepts
The aforementioned testbenches are custom built and come equipped with powerful servos. They are driven by Simulink, controlling the simulations and data flows. But with great power comes great responsibility: and in this case, that responsibility meant keeping track of everything occurring during these extended test runs, even when no one was physically present.
Our client needed a way to follow the testing progress from anywhere, preferably without the need to be trapped in a control room. The ideal solution? Something accessible from any browser, capable of streaming live metrics without missing a tick.
Equally important was the need to safeguard the logs generated during these marathon tests. With hundreds of metrics being tracked by milliseconds, the overall volume of data was immense. Losing even a small portion could mean the difference between a successful test and weeks of wasted effort. The possibility of a crash presented a serious risk, threatening to wipe out crucial logs if not properly backed up.
The solution: an efficient Clickhouse & Grafana setup
After some brainstorming and a few rounds of coffee, we designed a four-part solution tailored to address that problem effectively.
- UDP streaming the metrics: Simulink has a module to send all 200+ metrics in real-time using UDP. This makes the data available on the network, enabling us to use traditional data engineering patterns like pub/sub to smoothly ingest this firehose of data at scale.
- A Clickhouse database: After a quick sanity check, we store all the metrics in ClickHouse, an OLAP database known for its ability to process large volumes of data with lightning speed. This allowed us to not only store the metrics but also make them available for analysis by leveraging the large function library provided by that database. Even better: it's available in near real time, during the test!
- Grafana for visualization and alerts: To make the data accessible to everyone involved, we deployed Grafana. With its powerful dashboard and alerting capabilities, team members could create their own dashboards, set up alerts for any malfunctions, and stay updated on the test's progress—whether they were in the office or outside. With minimal training, team members can work autonomously with the dataset.
- Offsite log backup: Finally, to address the risk of data loss, we implemented an offsite backup solution for the logs. This way, even in the unlikely event of a crash or fire, the precious data was securely stored and easily recoverable.
The benefits: peace of mind and control
The solution we delivered did more than just check the boxes—it provided our client with peace of mind. They could now monitor the tests from anywhere, analyze the data in real-time, and sleep peacefully knowing that all the logs were safely backed up. Alerts ensured that any issues were flagged immediately, allowing for quick intervention if something went wrong.
In the end, what started as a daunting challenge turned into a seamless system that not only met our client’s needs but also empowered them to push the boundaries of their endurance testing. All without breaking a sweat—or losing a single log.