Page 1 of 2

Hotspots freeze up

Posted: Fri Mar 03, 2023 6:09 pm
by K1KHJ
Hey all -

I've searched the forum for topics on this but didn't find much new info.

I run three MMDVM hotspots on DMR BM. They're based on the pi zero W. Every so often, maybe once a week, they lock up. Not all of them at once, but they've all done this. I have to unplug the power to restart them.

I've replaced SD cards, flashed a new image (obviously), re-flashed the firmware (even going back a version to see if that helps). I've tried various power supplies, some plugged into a surge protector, others into a usb port.

I have them set up to reboot every other day, which seems to have prolonged their "lives" but they still lock up.

I don't know what else to check.

Thanks.
Perry

Re: Hotspots freeze up

Posted: Fri Mar 03, 2023 9:32 pm
by KN2TOD
You say they lock up, but are their dashboards still accessible? Or can you get to the command line (SSH) directly in such situations?

It's possible the MMDVMHost program has stalled because it hasn't properly resynced with BM after an internet interruption and/or you've run out of log (tmpfs) space. So, if you can get to the dashboard and/or a command prompt, there are things you can check, things you can do, to recovery without risking a full (abrupt) shutdown.

Re: Hotspots freeze up

Posted: Fri Mar 03, 2023 10:07 pm
by K1KHJ
No, I can't access anything. It's like it goes off the network. Can't be found. I guess locks up isn't accurate. Dies is more like it.

Although, the router shows them still on the network when this occurs. I'm thinking it's a network thing. All of these hotspots can't be bad.

I got a new router recently. It did this with the old one.

Re: Hotspots freeze up

Posted: Sat Mar 04, 2023 2:05 am
by KN2TOD
Ok, so it drops off the network, but does it start advertising an Access Point that you can access?

Do you have a spare monitor and keyboard you can plug into one of the hotspots and leave attached until a failure occurs and see if you can query the system at that point?

And ... are there some nearby sources of interference (motors, LED lights, etc.) or a recent change in the environment that could be impacting the hotspots? Are these failures occurring around the same time of day, or around the same part of the week?

Re: Hotspots freeze up

Posted: Sat Mar 04, 2023 10:01 pm
by K1KHJ
I have two hotspots on the TGIF network. One of them just died, but I can still access it. The log has a lot of entries like this:

E: 2023-03-04 19:01:36.254 DMR Slot 1, overflow in the DMR slot RF queue
E: 2023-03-04 19:01:37.701 DMR Slot 1, overflow in the DMR slot RF queue
E: 2023-03-04 19:01:37.707 DMR Slot 1, overflow in the DMR slot RF queue
E: 2023-03-04 19:01:37.712 DMR Slot 1, overflow in the DMR slot RF queue
E: 2023-03-04 19:01:37.718 DMR Slot 1, overflow in the DMR slot RF queue
E: 2023-03-04 19:01:37.723 DMR Slot 1, overflow in the DMR slot RF queue
E: 2023-03-04 19:01:37.729 DMR Slot 1, overflow in the DMR slot RF queue
E: 2023-03-04 19:01:37.734 DMR Slot 1, overflow in the DMR slot RF queue
E: 2023-03-04 19:01:37.740 DMR Slot 1, overflow in the DMR slot RF queue
E: 2023-03-04 19:01:37.746 DMR Slot 1, overflow in the DMR slot RF queue
E: 2023-03-04 19:01:37.751 DMR Slot 1, overflow in the DMR slot RF queue
E: 2023-03-04 19:01:37.757 DMR Slot 1, overflow in the DMR slot RF queue
E: 2023-03-04 19:01:38.536 DMR Slot 1, overflow in the DMR slot RF queue

This is interesting because the BM hotspots never let me back in when this happens. They are dead as a doornail until I unplug the power and reinsert it.

I was able to restart it using pistar-mmdvmhshatreset. It seemed okay but I rebooted it anyway thru pistar. I didn't have to unplug it.

Re: Hotspots freeze up

Posted: Sun Mar 05, 2023 2:40 am
by KN2TOD
Thank you for your observations. It confirms a suspicion of mine: the first time I was alerted to the "overflow queue" problem was exactly the same situation you find yourself in: one of my hotspots appeared locked up, but upon further investigation, I found the log files maxed out.

(see <viewtopic.php?p=23011&hilit=overflow#p23011> for more discussion on this.)

Since you are able to invoke pistar-mmdvmhshatreset, you might try this sequence instead (it doesn't require a reboot or unplug):

Code: Select all

rpi-rw
sudo systemctl stop mmdvmhost.service
sudo rm /var/log/pi-star/*.log
sudo systemctl start mmdvmhost.service
rpi-ro
This does wipe out the Last Heard list, but at least you can continue on, for a while anyway. A total shutdown (a soft poweroff followed by a powerdown/unplug) ultimately cures the problem, at least for short periods of time.

This overflow problem comes and goes, but does seem to be more pervasive recently; no one in the know has either confirmed the problem nor offered any solutions or workarounds, so we'll just have to muddle through the occurrences -- and wait for an answer.

Out of curiosity: what TG's are involved here? How many of them are static? What BM servers are you using? - trying to correlate my observations with your observations.

Re: Hotspots freeze up

Posted: Sun Mar 05, 2023 4:12 pm
by K1KHJ
Thanks for the tips.

I'm using 3103.

All of the TGs are static. For BM, I'm using:

91
3100
31656

TGIF:
2021
31665
9050

Re: Hotspots freeze up

Posted: Sun Mar 05, 2023 5:57 pm
by G8SEZ
Might be down to the clocking of the CPU between the Pi and the HAT modem, or possibly problems with the clock on the ADF7021 chip from the TCXO.

Not seen this for a long time but one of my HAT modems needed its TCXO replacing with a better spec one bought from a decent supplier (Digikey or may have been Mouser). Be careful with sine vs square wave output from TCXO and the ubiquitous 10pF coupling cap that is not needed if the TCXO is square wave output.

Re: Hotspots freeze up

Posted: Mon Mar 06, 2023 3:41 am
by KN2TOD
K1KHJ wrote: Sun Mar 05, 2023 4:12 pm All of the TGs are static. For BM, I'm using:

91
3100
31656
TGs 91 and 31656 are frequent "offenders" - there is heavy traffic in both for long periods of time each day, so overflows here are not surprising.

But they come and go; perhaps a minute or two of overflows one day and then several days later a lengthy stream of hits. Unpredictable.

I like your idea of using the Hat reset process to see if it can stem the tides of overflows - hadn't thought of that before. I'll add that to my "tool kit" for working this problem going forward. Thanks!

Re: Hotspots freeze up

Posted: Mon Mar 06, 2023 2:55 pm
by KN2TOD
G8SEZ wrote: Sun Mar 05, 2023 5:57 pm Might be down to the clocking of the CPU between the Pi and the HAT modem, or possibly problems with the clock on the ADF7021 chip from the TCXO.

Not seen this for a long time but one of my HAT modems needed its TCXO replacing with a better spec one bought from a decent supplier (Digikey or may have been Mouser). Be careful with sine vs square wave output from TCXO and the ubiquitous 10pF coupling cap that is not needed if the TCXO is square wave output.
I'm not doubting this explanation but ... resoldering/replacing bad chips/boards is not a feasible solution but begs the question: are there any software/firmware changes that can be made to detect, correct or mitigate the effects of these errant components?

More generally, are there any steps or processes that can be employed to alert us to deviations and help us recover/reset the components before the situation spins out of control? And having identified possible detection/recovery processes, can they possibly be automated?