Time Series Data: A Recurring Theme

When I graduated from college (over three-and-a-half decades ago), I had an eclectic mix of skills. I obtained degrees in economics and political sciences. But I spent a lot of my personal time working on building computers and writing computer programs. I also spent a lot of my class time learning about econometrics – that is, the representation of economic systems in mathematical/statistical models. While studying, I began using SPSS to analyze time series data.

Commercial Tools (and IP) Ruled

When I started my first job, I used SPSS for all sorts of statistical studies. In particular, I built financial models for the United States Air Force so that they could forecast future spending on the Joint Cruise Missile program. But within a few years, the SPSS tool was superseded by a new program out of Cary, NC. That program was the Statistical Analysis System (a.k.a., SAS). And I have used SAS ever since.

At first, I used the tool as a very fancy summation engine and report generator. It even served as the linchpin of a test-bed generation system that I built for a major telecommunications company. In the nineties, I began using SAS for time series data analysis. In particular, we piped CPU statistics (in the form of RMF and SMF data) into SAS-based performance tools.

Open Source Tools Enter The Fray

As the years progressed, my roles changed and my use of SAS (and time series data) began to wane. But in the past decade, I started using time series data analysis tools to once again conduct capacity and performance studies. At a major financial institution, we collected system data from both Windows and Unix systems throughout the company. And we used this data to build forecasts for future infrastructure acquisitions.

Yes, we continued to use SAS. But we also began to use tools like R. R became a pivotal tool in most universities. But many businesses still used SAS for their “big iron” systems. At the same time, many companies moved from SAS to Microsoft-based tools (including MS Excel and its pivot tables).

TICK Seizes Time Series Data Crown

Over the past few years, “stack-oriented” tools have emerged as the next “new thing” in data centers. [Note: Stacks are like clouds; they are everywhere and they are impossible to define simply.] Most corporations have someone’s “stack” running their business – whether it be Amazon AWS, Microsoft Azure, Docker, Kubernetes, or a plethora of other tools.  And most commercial ventures are choosing hybrid stacks (with commercial and open source components).

And the migration towards “stacks” for execution is encouraging the migration to “stacks” for analysis. Indeed, the entire shift towards NoSQL databases is being paired with a shift towards time series databases.  Today, one of the hottest “stacks” for analysis is TICK (i.e., Telegraf, InfluxDB, Chronograf, and Kapacitor).

TICK Stack @ Home

Like most projects, I stumbled onto the TICK stack. I use Home Assistant to manage a plethora of IoT devices. And as the device portfolio has grown, my need for monitoring these devices has also increased. A few months ago, I noted that an InfluxDB add-on could be found for HassIO.  So I installed the add-on and started collecting information about my Home Assistant installation.

Unfortunately, the data that I collected began to exceed my capacity to store the data on the SD card that I had in my Raspberry Pi. So after running the system for a few weeks, I decided to turn the data collection off – at least until I solved some architectural problems. And so the TICK stack went on the back burner.

I had solved a bunch of other IoT issues last week. So this week, I decided to focus on getting the TICK stack operational within the office. After careful consideration, I concluded that the test cases for monitoring would be a Windows/Intel server, a Windows laptop, my Pi-hole server, and my Home Assistant instance.

Since I was working with my existing asset inventory, I decided to host the key services (or daemons) on my Windows server. So I installed Chronograf, InfluxDB, and Kapicitor onto that system. Since there was no native support for a Windows service install, I used the Non-Sucking Service Manager (NSSM) to create the relevant Windows services. At the same time, I installed Telegraf onto a variety of desktops, laptops, and Linux systems. After only a few hiccups, I finally got everything deployed and functioning automatically. Phew!

Bottom Line

I implemented the TICK components onto a large number of systems. And I am now collecting all sorts of time series data from across the network. As I think about what I’ve done in the past few days, I realize just how important it is to stand on the shoulders of others. A few decades ago, I would have paid thousands of dollars to collect and analyze this data. Today, I can do it with only a minimal investment of time and materials. And given these minimal costs, it will be possible to use these findings for almost every DevOps engagement that arises.