Hi there, we have a few devices that are constantly logged in online. If the internet connection gets interrupted, they get logged out, and we have to go log them all back in manually with their security tokens. We just fixed a long-standing issue with these disconnects a few months ago (it appeared to be related to the ethernet in the walls), so it's disheartening to see them come back with the only real change being the removal of the Wink and the addition of the Hubitat.
I've been running the Logs in a browser tab to see if there are any errors around the time the devices disconnect.
In one case, I suspect it might have had to do with my PowerView hub being randomly inaccessible. Network seemed to stabilize a bit after I removed the app and devices, but not exactly sure if that was the issue.
In another case, I installed a PetSafe app and device, and it threw a java UnknownHostException error while trying to refreshDevices.
The Hubitat is plugged directly into our primary router in the middle of the house. We have one other router working as a wifi mesh node in another part of the house.
Can someone who knows more about network management help me understand why and how a device's internet connection can be interrupted by another device on the same network? I know that's kind of a broad question, so links to beginner IT concepts would be welcome lol. I've tried in the past to figure out how to run logs on the router itself, or from my laptop, to see exactly what's happening to the network when this happens but I can't make heads or tails of them and don't even know if I was logging the right things.
Unless you have apps that are using the network the hub only uses it for time sync or when you access it - you can actually unplug it from the router and it will still control all of the zigbee and zwave devices.
As far as interrupting, a few of ways that come to mind immediately:
if two devices have the same IP address
network storm - device experiences an error condition that causes it to send out a rapid and large amount of broadcast messages effectively clogging the router
a device with high priority starts a large upload or download (requires that the router supports traffic prioritization) and consumes a large portion of the available bandwidth
Thanks a bunch for being willing to go super basic for me haha. Thankfully we usually avoid #1 and #3 -- we haven't assigned any static IPs (I was going to for the Hubitat but was advised against it), and I usually throttle my FTP bandwidth if I'm working and have to upload/download a lot. Which I haven't the last few days.
I'll have to look more closely at the logs for #2. When it was trying to access the PowerView and failing, it did send out a flurry of warnings, but it looked like it capped it at about a dozen. Maybe I need to enable debugging on more devices for a while.
Could also look at your router logs to see if you’re getting a lot of Denial of Service (DOS) attacks that run an extended time from the outside, and check your ISP for any outage or service degradations that correspond - sometime the connection is there but the name resolver service (DNS) could be unavailable which means that unless you have the internet IP address for where you’re going you’re stuck (sort of like your GPS going out in an unfamilar city - the road is there but finding your way is challenging).
It isn't so much of an IP clash we are worried about, it is more if things "move around". If you are trying to access the hub at 192.168.0.102, and it isn't there anymore, that could be annoying. You have to search for the hub and input the new address.
But if you add something like a Lutron Bridge, or Alexa into the mix where they actively NEED to know where the hub is, then chaos ensues when they can't find the hub.
I don't suppose any of this means anything to you? This is just the Hubitat and Router logs from 4:19pm today. Obviously, having to suppress 10k messages 10 or more times a minute is a problem, how can I tell what's causing it?
Hubitat:
app:152 2021-02-03 04:19:20.102 pm error java.net.UnknownHostException: api.ps-smartfeed.cloud.petsafe.net: Temporary failure in name resolution on line 399 (refreshDevices)
Router:
Feb 3 16:19:01 kernel: net_ratelimit: 10661 callbacks suppressed
Feb 3 16:19:04 WLCEVENTD: eth6: Assoc F8:36:9B:83:62:8F
Feb 3 16:19:06 kernel: net_ratelimit: 10719 callbacks suppressed
Feb 3 16:19:11 kernel: net_ratelimit: 10823 callbacks suppressed
Feb 3 16:19:16 kernel: net_ratelimit: 10877 callbacks suppressed
Feb 3 16:19:21 kernel: net_ratelimit: 10795 callbacks suppressed
Feb 3 16:19:26 kernel: net_ratelimit: 10618 callbacks suppressed
Feb 3 16:19:31 kernel: net_ratelimit: 10707 callbacks suppressed
Feb 3 16:19:36 kernel: net_ratelimit: 10585 callbacks suppressed
Feb 3 16:19:41 kernel: net_ratelimit: 10405 callbacks suppressed
Feb 3 16:19:46 kernel: net_ratelimit: 10721 callbacks suppressed
Feb 3 16:19:52 kernel: net_ratelimit: 10518 callbacks suppressed
Feb 3 16:19:52 WLCEVENTD: eth6: Assoc 2C:FD:A1:62:9C:E9
Feb 3 16:19:52 WLCEVENTD: eth8: Assoc 2C:FD:A1:62:9C:EC
Feb 3 16:19:52 kernel: wfd_registerdevice Successfully registered dev wds0.0.12 ifidx 2 wfd_idx 0
Feb 3 16:19:52 kernel: Register interface [wds0.0.12] MAC: 60:45:cb:d0:a6:30
Feb 3 16:19:53 kernel: wfd_registerdevice Successfully registered dev wds2.0.6 ifidx 3 wfd_idx 2
Feb 3 16:19:53 kernel: Register interface [wds2.0.6] MAC: 60:45:cb:d0:a6:38
Feb 3 16:19:54 WLCEVENTD: eth6: Disassoc 2C:FD:A1:62:9C:E9
Feb 3 16:19:54 kernel: wfd_unregisterdevice Successfully unregistered ifidx 2 wfd_idx 0
Feb 3 16:19:56 kernel: wfd_unregisterdevice Successfully unregistered ifidx 3 wfd_idx 2
Feb 3 16:19:56 WLCEVENTD: eth8: Disassoc 2C:FD:A1:62:9C:EC
Feb 3 16:19:56 WLCEVENTD: eth6: Assoc 2C:FD:A1:62:9C:E9
Feb 3 16:19:56 kernel: wfd_registerdevice Successfully registered dev wds0.0.12 ifidx 2 wfd_idx 0
Feb 3 16:19:56 kernel: Register interface [wds0.0.12] MAC: 60:45:cb:d0:a6:30
Feb 3 16:19:57 kernel: net_ratelimit: 10435 callbacks suppressed
Edit: I mean, obviously it makes sense that the SmartFeed error could have triggered it. But why would that throw so many errors? Shouldn't it just throw a few?
Yeah, ASUS router. Doing some Googling, it's possible to put some setting lower so it won't throw so many records. Worth a try!
I'm noticing that there's a certain MAC address that throws a bajillion errors every time it's "associated." It's not the Hubitat, though, and it's not currently connected anywhere so I'm not entirely sure what it is.
Hmm, maybe I need to look at our other Asus routers that we were using with their mesh system. Thanks for the hints.
Edit: Narrowing things down. Two MAC addresses appear to be associated in some way with the PetSafe Smart Feeder (that's the Shenzhen device). It threw a load of errors even opening the app on my phone. Completely reset the feeder and reconnected it to Wifi, haven't seen the same thing since. Also, noticed that the router firmware hadn't been updated in a while, and it was acting a little funny... manually updated to the latest stable release (which was last week incidentally) and errors within Hubitat don't seem to affect the router logs at all. So, long story short, might have just been a router problem. So TL;DR I think the "storm" concept is what was happening, thank you for showing me where and how to troubleshoot!
I think I disagree on the use of the network by Hubitat. While it may not be a factor in the OP's issue, the hub logs all events to the local net. I say this because when I use node Red to capture events they all come through the router. I believe the events are published whether something is listening or not, but I'm not positive.
The only way Node-Red would see events is if you have MakerAPI running and are subscribing to them through it, If you’re not running an external application, i.e. Node-Red, that is requesting services there is no traffic; even if you have MakerAPI installed, until you make a request, one time or subscription, it just sits there idle.