Influxdb logger stopped working

Doug_Phoenix · January 14, 2024, 10:57pm

Hello,

I've been using Influxdb logger for a number of months now. On Dec 16, logging of data to my Influx database stopped working. I'm finally getting to troubleshooting now (long story).

I have InfluxDB running on a Windows PC. Yes, it's not a great idea, but I got it working several months ago. My other automation controller (Home Assistant) also points to this database, and I still see that data. (Curiously it went down around the same time, but I was able to bring it back up.)

I use Grafana through Home Assistant for convenience.

Here is some error log entries for Influx logger from today.

app:2582024-01-14 03:42:52.083 PMwarnmethod handleEvent of app InfluxDB Logger ran for 145,002ms
app:2582024-01-14 03:42:27.180 PMerrorjava.lang.RuntimeException: Failed to acquire semaphore for method handleEvent within 120 seconds (handleEvent)
app:2582024-01-14 03:42:02.075 PMwarnmethod handleEvent of app InfluxDB Logger ran for 166,287ms
app:2582024-01-14 03:41:23.998 PMerrorjava.lang.RuntimeException: Failed to acquire semaphore for method handleEvent within 120 seconds (handleEvent)
app:2582024-01-14 03:41:13.613 PMwarnmethod handleEvent of app InfluxDB Logger ran for 161,413ms
app:2582024-01-14 03:41:07.260 PMerrorjava.lang.RuntimeException: Failed to acquire semaphore for method handleInfluxResponse within 120 seconds (handleInfluxResponse)
app:2582024-01-14 03:40:26.354 PMerrorjava.lang.OutOfMemoryError: Java heap space (handleEvent)
app:2582024-01-14 03:40:13.278 PMerrorjava.lang.RuntimeException: Failed to acquire semaphore for method handleEvent within 120 seconds (handleEvent)
app:2582024-01-14 03:39:07.153 PMwarnBacklog of 174558 events queued for InfluxDB
app:2582024-01-14 03:39:07.138 PMerrorBacklog of 174608 events exceeds limit of 5000: dropping 50 events (failsafe)
app:2582024-01-14 03:37:49.941 PMerrorjava.lang.OutOfMemoryError: Java heap space (handleInfluxResponse)

FYI here are logs pertaining to memory in case it's useful. I'm running a C-7.

dev:4232024-01-14 03:55:18.266 PMinfoSystem Monitor Memory free is online
dev:4232024-01-14 03:55:18.252 PMinfoSystem Monitor Memory free value is 1941.7 MiB
dev:4232024-01-14 03:55:03.348 PMinfoSystem Monitor Memory free is online
dev:4232024-01-14 03:55:03.345 PMinfoSystem Monitor Memory free value is 1932.8 MiB
dev:4232024-01-14 03:54:48.246 PMinfoSystem Monitor Memory free is online
dev:4232024-01-14 03:54:48.239 PMinfoSystem Monitor Memory free value is 1934.1 MiB
dev:4232024-01-14 03:54:33.314 PMinfoSystem Monitor Memory free is online
dev:4232024-01-14 03:54:33.242 PMinfoSystem Monitor Memory free value is 1934.0 MiB

dev:4222024-01-14 03:56:03.253 PMinfoSystem Monitor Memory use is online
dev:4222024-01-14 03:56:03.248 PMinfoSystem Monitor Memory use value is 1854.3 MiB
dev:4222024-01-14 03:55:48.355 PMinfoSystem Monitor Memory use is online
dev:4222024-01-14 03:55:48.351 PMinfoSystem Monitor Memory use value is 1847.8 MiB
dev:4222024-01-14 03:55:33.248 PMinfoSystem Monitor Memory use is online
dev:4222024-01-14 03:55:33.243 PMinfoSystem Monitor Memory use value is 1850.3 MiB
dev:4222024-01-14 03:55:18.483 PMinfoSystem Monitor Memory use is online
dev:4222024-01-14 03:55:18.465 PMinfoSystem Monitor Memory use value is 1844.8 MiB
dev:4222024-01-14 03:55:03.371 PMinfoSystem Monitor Memory use is online
dev:4222024-01-14 03:55:03.367 PMinfoSystem Monitor Memory use value is 1853.7 MiB

Would someone please suggest next steps for troubleshooting? Thanks!

dennypage · January 14, 2024, 11:19pm

You have a massive backlog of events and are in the Pit Of Despair. There is no way out.
How you got there was creating events way faster that they can be posted to InfluxDB.

Your best move is to delete the app and start over.

Suggestions for going forward:

Use multiple instances of InfluxDB-Logger.
Subscribe to fewer devices and/or attributes.
Avoid attempting to log fast moving attributes such as motion sensors.
Do not use keep-alive (softpoll) events unless absolutely needed.
- If you must use keep-alive events, use a high interval (15 min or more).
- If you must use keep-alive events for some devices/attributes, use a separate InfluxDB-Logger instance for those devices/attributes.
Use a higher batch size limit.
Use a lower backlog size limit.

Doug_Phoenix · January 14, 2024, 11:57pm

Thank you, @dennypage !

InfluxDB logger was plugged up for a month, so perhaps I should not be surprised. I'm reinstalling now. I'll leave out switches and a motion sensor that I wasn't using.

I'm fortunate to have received an expert reply so quickly (on a Sunday, no less). Thanks again!

Doug_Phoenix · January 15, 2024, 4:18am

I see data from Hubitat coming in to Influx now. Plots in Grafana are being populated.

Hats off for the great help!

iskren.p.petkov · July 19, 2024, 3:14am

Thank you for the great eye-opening strategy on using InfluxDB Logger more efficiently!!