Today I experienced for the second time my hub's web interface go totally unresponsive. My automations and z-wave continue to work however.
I noticed this as I have an automation that turns on an off a light to indicate I'm in a meeting using a z-wave button. When the light turns on or off I have a notification sent to mobile application users so everyone knows that I'm in a meeting and to not disturb or enter in addition to the light.
Today I initiated the meeting and received the notification. However, just 10-20 minutes later when the meeting was over and I used the button to turn off the light I didn't get the notification that the meeting was over. I went out of my office to verify that my button actually worked (turned off the light) and that the button battery didn't somehow die. I verified that the light was off and so I know the automations were working.
I tried to pull up my hub in the mobile app but it was unresponsive. I tried to pull up my hub via the web interface and it was also unresponsive. I tried to pull up the diagnostic tool it was also unresponsive. The only option I believe I had at that point was to remove power from the unit for 30 seconds and then reapply power. Thinking about it now I probably should have unplugged and replugged the ethernet and see if that changed anything.
Once the hub came back up I looked in the logs and I noticed the following. DNS entries appear to have started failing earlier this morning. I wonder if those errors piled up in some way and caused the unresponsiveness. I have attached a screen shot of those errors. I'm not sure why this is happening but wanted to make someone aware of it. I'm happy to help in whatever way I can.
Once the hub is rebooted things work normally. DNS is resolved as expected. I ran the test mentioned and it works fine. To me this seems as either the Ethernet Interface is going down somehow on the hardware, the network stack is borked, or some combination of the two. I forgot to try and ping the interface so I don’t know if the interface was responding or not. I’ll have to wait for this to happen again to do more extensive tests. What I find interesting is that the DNS failures started hours before when I noticed the problem. It’s what makes me lean towards the networking stack/interface getting borked but without console access everything is pure speculation. I feel confident it’s not my network. I’m running Ubiquiti gear and I’ve had no other problems on the network. I can’t 100% rule it out but given the symptoms I would be surprised.
1> When you lose the web interface, can you get to yourhubip:8081?
2> Do a soft reset and restore. This will ensure your database is not corrupt which can cause lockups. After that shutdown the hub and unplug for 1 minute. Power back up and update to the latest platform.
3> Do you have jumbo frames enabled on any network equipment (this includes your router or any switches you have) (I also realize that's unlikely since your hub is still functioning but it's something to check)
4> In yourhubip>>settings>>network settings is your speed set to fixed or auto? If fixed, set it to auto
In the end though I suspect 2 will likely fix you up.
As expected it happened again. This time with a clearer head I was able to do better troubleshooting.
Pinging the device returned the following.
PING 192.168.200.30 (192.168.200.30): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8
cRequest timeout for icmp_seq 9
Request timeout for icmp_seq 10
^C
--- 192.168.200.30 ping statistics ---
12 packets transmitted, 0 packets received, 100.0% packet loss
OK. So it appears that the device is no longer responding to the network. Taking a look around I don't see any errors or problems with the network gear. I went to the C-7 and visually inspect it. Green light.
This time instead of power cycling I decide to remove the ethernet cord as mentioned previously from the C-7 for a few seconds and then plug it back in. I went back to my terminal console and attempt the ping again.
PING 192.168.200.30 (192.168.200.30): 56 data bytes
64 bytes from 192.168.200.30: icmp_seq=0 ttl=64 time=3.762 ms
64 bytes from 192.168.200.30: icmp_seq=1 ttl=64 time=1.476 ms
64 bytes from 192.168.200.30: icmp_seq=2 ttl=64 time=1.468 ms
64 bytes from 192.168.200.30: icmp_seq=3 ttl=64 time=1.553 ms
64 bytes from 192.168.200.30: icmp_seq=4 ttl=64 time=1.506 ms
64 bytes from 192.168.200.30: icmp_seq=5 ttl=64 time=1.336 ms
c64 bytes from 192.168.200.30: icmp_seq=6 ttl=64 time=1.478 ms
64 bytes from 192.168.200.30: icmp_seq=7 ttl=64 time=1.714 ms
^C
--- 192.168.200.30 ping statistics ---
8 packets transmitted, 8 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.336/1.787/3.762/0.753 ms
The device was back up. So it's clear that either one of two things is happening as I previously suspected:
The ethernet interface is getting borked.
The Hubitat DHCP client software isn't renewing the IP lease.
Currently I have DHCP defining a fixed IP for this network client (Hubitat). To remove the DHCP possibility I'm going to switch it to a fixed IP in Hubitat and then we would know if it happens again the Interface is borking vs not renewing its IP address.
I will say this hasn't happened until recently so that also means one of two things. Either my C-7 all of a sudden as a hardware fault and is slowly dying or a recent version introduced this bug. Given I have seen similar complaints I think I'm leaning towards a software bug.
I wish I had lower level access to the systems logs as I feel confident it would be clear to me looking at them what is happening. Maybe support can somehow look at my logs and determine which of these possibilities is actually happening.
Hmmm. So I put the fixed IP interface in place and some DNS appears to be failing. I have AdGuard Home (Version: v0.107.48) as my primary DNS server. Which I go to Hubitat's network test it is unable to resolve addresses to ping.
The DNS server is working as expected so I don't believe the problem is there. However I notice that my version is not quite up to date. I will also update that just in case.
; <<>> DiG 9.10.6 <<>> @192.168.200.2 aws.amazon.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12999
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;aws.amazon.com. IN A
;; ANSWER SECTION:
aws.amazon.com. 7200 IN CNAME tp.8e49140c2-frontier.amazon.com.
tp.8e49140c2-frontier.amazon.com. 60 IN CNAME dr49lng3n1n2s.cloudfront.net.
dr49lng3n1n2s.cloudfront.net. 60 IN A 13.33.4.84
dr49lng3n1n2s.cloudfront.net. 60 IN A 13.33.4.125
dr49lng3n1n2s.cloudfront.net. 60 IN A 13.33.4.99
dr49lng3n1n2s.cloudfront.net. 60 IN A 13.33.4.16
;; Query time: 96 msec
;; SERVER: 192.168.200.2#53(192.168.200.2)
;; WHEN: Thu Jun 20 09:24:18 EDT 2024
;; MSG SIZE rcvd: 185
I did reboot after changing the network settings. I'm going to change to the gateway's DNS server and see if that improves anything.
In the process of changing to a static IP in the device itself I noticed that the interface was set to fixed and changed it to auto. Once I rebooted I still had problems with the DNS as noted so I don't believe that the interface speed has anything to do with it.
If you disable autonegotiation, it hides link drops and other physical layer problems. Only disable autonegotiation to end-devices, such as older NICs that do not support autonegotiation. Do not disable autonegotiation between switches unless absolutely required, as physical layer problems can go undetected and result in spanning tree loops.
A few years back, I used to have an inexpensive Asus Gigabit network switch that caused problems exactly like this. Unplugging a network cable and then reconnecting it would resolve the issue temporarily. If possible, can you try plugging your Hubitat hub into a different network port on your network switch? Or possibly use a different network switch altogether? Just another hypothesis for you to investigate.
You may also want to replace the Ethernet patch cord you're using for the HE hub. You may also want to try a different USB power supply for the HE hub, just in case the one you have is starting to fail.
So I didn't disable or turn on the speed explicitly. The best I can guess is that this setting is default or happened due to C-5 settings and or migration from C-5.
No, I have not done a soft reset nor do I think such an action is required for the symptoms that are demonstrated here. As mentioned simply unplugging and replugging the ethernet brought the interface back up. This in no way points to a database problem or the need for a softreset. I don't believe either of those rebuild the OS so I just can't fathom that solving this problem.
Clearly from what I said earlier if the interface isn't responding to pings then the diagnostic port isn't going to respond since it is bound to the same interface. So, no when it happens it does not respond.
sorry I didn't see the unplugging the ethernet bit... Though i would still keep it to auto., You're right about if it was set to fixed on the c5, that config would have migrated.