Zigbee Radio get stuck on Initializing, and lots of Zigbee Off Events Happening when radio is not off

Platform version 2.3.8.134
Hardware version C-8

I am not sure if it is coincidence, but the power company replaced the transformer on the pole outside my house, and the next morning Zigbee was down. On the Zigbee page, it was stuck on a status of Initializing... I could not fix it with rebooting the radio, I had to reboot the hub.

It has been happening every 1-2 days since, for about a week. I created a Webcore piston to reboot the hub on ZigbeeOff last night, in case it happened again, and I found this morning that the hub had rebooted about every 20 minutes through the night.

I decided to watch the Zigbee logs next time it happened, so I set a voice command to tell me when the ZigbeeOff event happened, and I removed the reboot so I could see what was happening.

What I found is that every time I get a ZigbeeOff event, it is because device 0000 has logged activity in the Zigbee Log. Zigbee is not actually down when I get this event. I read in a forum that device 0000 is the hub itself. What is the hub doing that it logs to Zigbee, and is it expected to cause a ZigbeeOff event? I assume not, and maybe this tells me something about my issue.

I am still waiting to catch when the Zigbee radio actually stops and goes to the broken Initializing state, but I am wondering if the ZigbeeOff events are related to that issue.

I have now changed my piston to wait one minute after a ZigbeeOff event, and then check a few Zigbee devices to see if they are inactive before rebooting the hub. I would like to find the root cause of my Zigbee radio getting stuck with Initializing, but it is hard to troubleshoot when I am getting all these (bogus?) ZigbeeOff events.

Edit: This is the event I get in the Zigbee log when I get a ZigbeeOff event:

I wonder if it might have to do with something like the utility putting in automatic meter reading, or some other control scheme?

The transformer story:

I am sure that nothing was done but a transformer change, and I have had automatic meter reading for years.

We were told one morning that we were going to lose power for a 1/2 hour since the transformer on our pole needed to be replaced, due a new restaurant opening next door that was on our transformer. Apparently they have a 3-phase pizza oven, that they are using it with a phase-converter on singe phase. When they turned it on, all their lights dimmed, and the power company determined that the transformer that we share with them was too old and too small, and that they almost blew it with the amps from the oven. So they replaced the old one with a new, larger transformer.

That night, we found all the LED lights in our house were slightly pulsating/dimming about twice per second. That next morning I work up to Zigbee down and Initializing. I called the power company and we were able to link it to when they restaurant had their pizza oven on, so that was the cause. I assumed the phase converter next door and our pulsating lights was messing with the Zigbee radio.

That was actually two weeks ago. The power company resolved our issue last week by giving the restaurant their own transformer on the next pole down, so it got them off our transformer. The lights are no longer pulsating when their oven is on, and after a week of Zigbee radio issues stuck on Initializing most mornings, I was sure getting them off our transformer would resolve the Zigbee issues.

The next morning Zigbee was down again and stuck on Initializing, and it has been continuing to happen. None of this was an issue before the initial transformer change. With notice of the power outage, I had shut down the hub via the UI before they cut power, and I unplugged the power from the hub before it came back on. I brought the hub back up after the power was restored. Regardless, we have had unplanned power outages before and the hub had no issues after power was was cut and then turned back on, so I do not think the power outage itself had anything to do with it.

So the question is, can a transformer, which is about 50 feet away from my Hub, cause the issues I am seeing? The old transformer was smaller. They left the new, larger transformer on the pole instead of replacing it with a smaller one again. My hunch is that this new, larger transformer is giving off some RF frequency the smaller one did not, but I have no way to prove that, and I'm not sure they would return us to a smaller transformer now even if I asked. They originally were going put a smaller one back up, but when they arrived I was in a work meeting (work from home) and when I told them they could not cut the power at that time, they decided to just leave the larger one there instead of downsizing it.

However, if I was getting random ZigbeeOff events, and they were not tied to the hub doing something as device 0000 when it happens, I could blame the transformer. But why am I getting the ZigbeeOff events only when the hub does some Zigbee thing that it is logging?

Edit: I just looked at timestamps, and the 0000 Hub event in the Zigbee log comes AFTER the ZigbeeOff event. Maybe this event is the hub restarting the radio? It seems now to be more of a response, then a cause, but I would like to know what this means.


ZigbeeOff

Edit: I just had the Zigbee Network Graph up when I got a ZigbeeOff event. I got a popup saying it could not get Zigbee info. I clicked OK and the graph did not change, so it recovered OK, but it looks like about a 10 second period with no Zigbee when this happens.

It recovers 99% of the time. When it does not, and the radio is left broken sitting on Initializing, it seems to only happen between 4am and 6am.

I was waiting for suggestions for a fix and details on that hub event, but I have not heard much.

The only thing I can think of is to change the Zigbee Channel - but that is disruptive so I have been waiting to try that. I am hesitant to move the hub given it is in a perfect central location in the house in a hall closet with ethernet, and moving it further away from the transformer also moves it further away from the majority of my Zigbee Devices.

This still goes back to the idea that the new transformer is the issue, which I cannot say for sure except that is when the issues started. I guess nobody else has experienced this type of behavior or found a solution.

Note: The hub must have been stuck on Initializing this morning, as I had a hub reboot at about 4am. Good to know my Piston is working to fix the issue with a reboot, but this is not the way I want to run the hub going forward.

When you get the error in your logs before the zigbee off event, does clicking on the "Info" box in the log get you to a device page?

Changing Zigbee channel is usually pretty straightforward, you may need to help some devices reconnect w/a restart of the device or re-pair, but most of them should
move over on their own. What is your current Zigbee channel and power setting?

Not familiar enough w/transformers/related to comment on that. Tagging @support_team to see if someone can help.

1 Like

When I click on info, I get "Device 0 does not exist on this hub. It was most likely deleted at some point. Please update your links"

Note that I found I get the 0000 log entry with a timestamp AFTER the zibeeOff event, so it seems to be a reaction, not a cause.

I have changed channels before, and it did take awhile for everything to come back, which is why I am holding off, I will kill a lot of automations, so maybe I will change channel tonight before bed.

Currently, I am on channel 20 and power 12.

I think device 0000 would normally be the hub's Zigbee controller. So if it's happening right after a Zigbee off/on event, it might be part of the controller "waking up." Not sure why you would get the "...does not exist..." message. :man_shrugging:

Changing them in the later evening is probably a good idea, give things time to settle down to a certain extent overnight.

One of our members had some comments:

Transformers operate at 60 hz obviously. I didn't calculate what harmonic of 60 would get interference at 2.4 ghz but it's a lot of zeros :smiley: And in general, harmonics decrease power aka distance. I'd say it's unlikely the RF path is the right tree to bark up. :smiley:

Power line issues, those seem more likely. What's unclear about the description is how it's affecting JUST the hub. All those tiny Zigbee modules inside each device seem to have as much a chance of being disrupted by power line issues, perhaps more because of the physical spacing between the AC line and the Zigbee power supply. I would not expect the power supply buried deep within the Smarts of a switch/dimmer/outlet to be any kind of exceptional.. in fact, it's probably just barely adequate. By that I mean power line issues will be more likely to get past the filtering in a Zigbee device than the Hub's external wall wart, which has a lot more room for decent filter capacitors.

It could be as simple as a loose ground on the new transformer.. I'd focus on transformer installation/human error LONG before I'd chase RF. But, I could be wrong. :smiley:

From my own reading up on distribution transformers as well, they do not emit much in the ways of RF. However, that transformer is wired into every wall of my house, so it may be causing interference through the power lines. From what I am reading, if it is powerline interference it would be some plugged-in device in my house that would create the RF interference when powered on, or even just plugged in. Again, this is a different sized transformer than we had before, technically overpowered for only two residential houses connected to it now. It was sized for two houses and a restaurant with an industrial pizza oven.

Now I'm wondering if something that cycles power in my house is causing it, in conjunction with the new transformer? The fridge and furnace are only things I can think of that cycle throughout the day. However, the way I lose the radio completely only in the early morning sounds like something tied to the power grid and demand. It is also inconsistent in frequency through the day. Tonight I had ten zigbeeOff events between 4pm and 7pm, some within 5 minutes of each other. For the last hour, I have had none. Seems like power grid dinner activity gearing up and and calming down?

I am really at a loss at this point. Maybe I should call the power company and ask them to put the proper sized transformer back up, it would at least provide me with a troubleshooting step to eliminate the over-sized transformer as a cause.

1 Like

I doubt they would honor your request. A transformer just transforms one voltage to another, via two coils and induction between the two. Depending on the number of turns on the primary and secondary coils, voltage will either be increased or decreased on the secondary. The new, "larger" transformer has exactly the same ratio of primary/secondary coils as the older one; the only difference is in capacity, i.e. in the number of amps it can safely deliver. There really is no "proper" size transformer. The power company won't intentionally install a transformer that can't handle the anticipated load, but a larger capacity transformer isn't really "improper" other than perhaps being a waste of money for the power company.

I guess anything is possible, but I can't see how having excess capacity could be a bad thing and would cause the kind of problems you're seeing. When certain things turn on (motors, compressors, etc.), the inrush of current can cause a momentary voltage drop, but a larger capacity transformer would make this less likely to occur.

However, if you have one or more bad connections in your house, that could be the problem. If you are experiencing voltage drops, you might have the power company check your connections at the meter socket (most likely they already examined and/or redid the connections at the pole). You may also want to have an electrician examine the connections inside your circuit breaker panel, especially if you have aluminum wiring.

If you really want to try to get to the bottom of this yourself, something like this may help: Pokit Innovations (pokitmeter.com). I bought both the Pokit Pro and Pokit Meter when they were on Kickstarter but haven't used them much as the project I was going to use them on keeps getting pushed to the back burner. But they do work, and you can do long term data capturing with them. You could monitor your line voltage and if there are regular drops at certain times, you might be able to correlate those with events in your home. Or if not, at least you might be armed with enough information for the power company to investigate.

2 Likes

I agree, no reason a larger transformer would be any different in actuality than the smaller one. I am just stuck on how this issue seemed to have started the day that transformer went up.

The funny thing is, we now have cleaner power than we had before. I used to see some very slight flickering in our bedroom ceiling LEDs when on lowest setting, but that is actually gone now with the new transformer. Seems like the power is cleaner, if anything. Power company checked wiring and voltage at the box when they were here. The house was completely rewired during a reno a few years ago, and we have had no issues. All new wiring, outlets and switches, and even new supply wire to the pole as I buried my supply line during the reno. So this is not a case of questionable old house wiring.

Having never subscribed to zigbeeOff, for all I know this has been happening since I got the new C-8 hub in January, I just know it has never got stuck on Initializing in the morning with the radio down, but I have had ZigBee issues with the new hub since day one, but so have many others. Sengled bulbs are continuing to be unreliable and any time I need to pair or re-pair a Zigbee device I have to fight to get it connected. Usually several tries, and a need to rebuild network to get it connected. I've spent 20 minutes trying to get a motion sensor reconnected.

So maybe the hub has always had issues, and it has manifested into this new issue of the radio going down in the mornings. Maybe the transformer introduced a slight new variable that made an existing issue worse.

It is tempting to power up my C-7 hub and see if it is getting the same thing, I never had issues with ZigBee on my C-7. Maybe this really is a hub issue, and it is strange that overall I have lots of Zigbee devices, and they generally stay connected and work, even through the zigbeeOff events.

Given that you are seeing these events quite frequently, within the span of a couple of hours, run this test to take the transformer completely out of the equation : use a UPS (fill up the battery, then disconnect it from mains so it's running on battery) or even a decent portable charger (like of of those that are used to recharge cell phones) and run your hub off that.

1 Like

Not a bad idea, I will try that. Since things are generally working, I'm OK gathering some data for a bit.

I also had a reboot again this morning. I'm losing my Webcore logs at reboot so I just started sending logs to a Google sheet. I will be able to graph when I get zigbeeOff, and reboots. Things have been quiet so far this morning, no zigbeeOff events in the last hour, but had about two per hour from 4am to 7am. The piston restarted around 4am so I know I had a reboot. Seeing this graphed could reveal something.

Edit:
Here is a live link to the chart I set-up. It should be current whenever loaded.
Zigbee Chart

Ok, I just had a fail where the Zigbee radio went to INITIALIZING. My recovery Piston did not get a Zigbee off event. The two sensors I was checking for INACTIVE are still reporting ACTIVE. Zigbee is definitely down though.

Looking at a few logs of the Zigbee devices in logs, I am just getting Queue Full errors on the sensor devices every time they report:
[error] java.lang.IllegalStateException: Queue full on line 534

I guess I have not had an INITALIZING error in days, since my Piston would not have rebooted given there is no zigbeeOff event before the INITIALIZING fail.

I tried Zigbee reboot, and it hangs as usual. When I got to the page, The PanID was 0000, extended pan ID was null, and it is not showing a channel anymore:

I clicked on "rebuild network" and status went to OFFLINE. I disabled Zigbee and it will not re-enable. Guess it is a hub reboot again.

Now I am not sure how to detect this state to reboot. I also assume the Queue Full errors tell us something, but I do not know what that fail entails.

Can anyone tell me what the Queue Full errors mean? I did see this once a month or so ago, before any transformer action, but I must not have noticed the Zigbee radio was on INITIALIZING, which it most likely was. I rebooted to get Zigbee back up, and figured the Queue Full errors were a one-off.

@bobbyD - any ideas on this? Outside my paygrade.

PM your hub's UID to @support_team, please.

PM'd Support Team.

I also just noticed I did get a zigbeeOff event before the fail at 2:11pm, but since the devices were still reporting ACTIVE it did not reboot. It had been down for an hour when I noticed it. I need a better canary to detect if the hub in the INITIALIZING state after a zigbeeOff event so I can reboot, but not sure what to look for if devices stay ACTIVE.

1 Like

Good. It's late on a Friday EST so unfortunately likely after the weekend before you're hear back from anyone...

I am not sure .. I did see anyone suggest using a NEW USB Power Brick ?
I was having al sorts of weird stuff happen and I changed out the stock usb brick 1amp
To a 2amp Samsung usb power brick < seems to helped a bunch !
Just a thought .. might be worth a try also ..
With all your power outages etc. maybe the USB Brick got wacked ?

1 Like

This is normal behavior when the Zigbee radio gets overwhelmed. Messages get backed up until the queue is full.
Look in your device log or Zigbee log and make sure that you don’t have a device spamming the mesh.

1 Like

Just remember that the Z-Radios don't get reset unless you power cycle. IF the queue is full, then clearly there's something unusual going on and a shutdown followed by power off for 10 seconds will clear out the SOC chip (Z-radio). Might help and other than the nuisance of the wait, it's not harmful.

At some point I am going to try a Power Bank as suggested and eliminate the mains connection from the brick to test if it is line interference to the hub. If it works off a battery, I won't know for sure if it is the brick unless I test with another brick as well. If I still have the same issue running on battery, I can eliminate the power supply though.

I want to keep the issue as-is until support weighs in, since I am not in a system down state and there still might be something to catch that helps with solving the issue.