Skip to content

influxdb

Monitoring with InfluxDB and Grafana on Kubernetes

Four years later, I still have not gotten the hang of telegraf, I'm still running my own home-made detailed system and process monitoring reporting to InfluxDB running container-lessly in lexicon and I feel the time is up for moving these services into the Kubernetes cluster. Besides keeping them updated, what I'm most looking forward is leveraging the cluster's infrastructure to expose these services (only) over HTTPS with automatically renewed SSL certs.

Detailed system and process monitoring

Never got the hang of telegraf, it was all too easy to cook my own monitoring...

Humble Beginnings

In fact, when I started building detailed process monitoring I knew nothing about telegraf, influxdb, grafana or even Raspberry Pi computers.

It was back in 2017, when pondering whether to build my next PC around an Intel Core i7-6950X or an AMD Ryzen 5 1600X, that I started looking into measuring CPU usage of a specific process. I wanted to better see and understand whether more (but slower) CPU cores would be a better investment than faster (but fewer) CPU cores.

At the time my PC had a AMD Phenom II X4 965 BE C3 with 4 cores at 3.4GHz, and I had no idea how often those CPU cores were all used to their full extent. To learn more about the possibilities (and limitations) of fully multi-threading CPU-bound applications, I started running top commands in a loop and dumping lines in .csv files to then plot charts in Google Sheets. This was very crude, but it did show the difference between rendering a video in Blender (not multi-threaded) compared to using the pulverize tool to fully multi-thread the same task:

Chart of CPU usage over time showing a single Blender process never using much more than one CPU core

Chart of CPU usage over time showing how pulverize, running 4 Blender processes, most CPU cores most of the time

This early ad-hoc effort resulted in a few scripts to measure per-proccess CPU usage, overall CPU with thermals, and even GPU usage.