Data retention for graphing

So i have been doing some experieiments with InfluxDB3 recently just to test out if there are some good reasons to move to it. I keep finding what i think are some pro's.

I have a fully functional instance of InfluxDB3 up and working side by side my v2 instance. It is being populated with the MQTT Exporter from Hubitat. One of the interesting thing was to see how much raw data i have in in it. I have a few tables with over 20 milion records and many between 1 million and 20.

I am starting to look at tryng to find ways to reduce that number and i am curious if anyone else has tried to approach this concern. As of right now with influxdb3 i am testing out one of the provided plugins that is a downsampler. Simply put it will take a measurement, analyze the vlaues for given intervals and then ouput some basic numbers to proided representational for that measurement. Think calculated/derived values like average, min, and Max for a given interval. This can consolidate the values considerably. That change along with a retention of a short period of time can keep the retained data small.This could be good for data that is llong term retention that i don't need to worry about getting into the fine details of what happened that day.

The second option i am looking at is using SQL to go in and remove duplicate records were the value hasn't changed. Think something like a contact sensor that has been closed for months, but has a record every x min. that can eliminate a significant number of unneeded records potentially. I know influxdb logger has some logic to prevent records from being posted that haven't chnaged, but before @dennypage took it over i am sure the version i used didn't do that, and now that i am experimenting with MQTT Exporter i don't think it does that either. It seems to post updated values for every attribute of a device if anything changes on that device.

I would imagine this is a issue that hits everyone that retains data for long term visualization. I am just curious how others are handeling this even if they don't use InfluxDB.

I just recently started using this:

You set limits on the number of records for the files for 5 min/hourly/daily/weekly and it rotates the old records out as the new ones come in.

1 Like

That looks like Watchtower has some built in methods for downsampling. I wonder what logic it uses for each of these accuracy buckets. Like how does it determine what value to put in their. or is it just the current value in those increments.

If it is some kind of downsampling that is pretty cool they have it built in for numeric metrics iike temp, humidity, and CPU usage. May not work well for things that are states like a contact being opened/closed. Do you know if it does the same accuracy/reduction for those kind of metrics?

Update** I downloaded Watchtower on my dev hub to take a look at it. It is a interesting solution. It does explain the accuracy buckets a bit more in the App. It states the app takes a average from the previous bucket for it's interval. So the 1 hour bucket would take the average of the 5 min interval from the last hour, the 1 day woud take the previous days 1 hour averages to come up with the 1 day average, and so on. That completely aligns with the Downsampling i was refereing to.

2 Likes