I used to have a lot of that. Somewhere you have a device that doesn't repeat well or a hop that is too far and won't repair (or can't repair for lack of a repeater) or too much load on the hub (and responses are too slow and the sending device freaks out because it doesn't get a response back in time).
You can sometimes find the offending device(s) by spamming on/off and watching the results. Hopefully you're primarily working with devices that support on/off. Turn on descriptive text logging for the devices. Then I would start turning a device on/off every 3 seconds and then get gradually faster until you are doing it 2 times or so a second. If the mesh isn't stable you will start seeing descriptive text logs repeated.
If it tends to happen with almost every device it is probably a device that was added near the very beginning of all of your devices or one that is very close to the hub and handles a lot of routes.
Another thing I've found is that older Z-Wave devices tend to show the "freak out" much more commonly than Z-Wave+ devices. I'm guessing it's because Z-Wave devices just panic and try alternate routes with the same message that failed where Z-Wave+ devices send discovery frames instead which don't ever show up in the logs. They probably both generate a lot of traffic though.
If that doesn't yield any results you can start sniffing packets to see what is really happening.