I blogged in 2013 about a bug in BT supplied FTTC/VDSL modems (here). BT & manufacturer fixed that modem, but sadly the same bug seems to exist in other modems.
We're seeing this in some Zyxel modems now. I am pretty sure I have had reports of other modes as well. The problem is that it is not always obvious what the issue is, people turn things off and back on again, and that fixes it, so we do not always get clear reports of issues.
The issue is that the modem seems to have some sort of packet acceleration in the chipset - for what reason we cannot image - it seems that it caches around 254 different IP/port/protocol sets. This is a bit like header compression by the sound of it.
This means these cached headers are reconstructed, which is fine, if not for a bug. It seems that when passing PPPoE the IP/port/protocols are matched but the PPPoE ID field is not, yet this is part of what is reconstructed.
This means that on any fixed IP line (IP being an one of the things that needs not to have changed to hit the cache), a PPP restart leaves any packets matching the cache from the previous PPPoE session not working (they send down the line with the PPPoE ID of the previous session, and so are dropped).
For a lot of people this does not matter - dynamic IP has no issue, and even a lot of Internet traffic has different IP/ports, e.g. accessing web pages, so just work.
Sadly there are a number of protocols that easily break, including VPNs like IPsec, and VoIP. So a simple PPP restart on a line causes things not to work. The other issue is these protocols tend to keep trying. so the cache never clears or times out. It just stays not working.
Resetting the modem is one (slow) way to fix, but all it actually needs is the Ethernet port to be reset - just un-plug the cable and re-plug. This causes the modem to clear the cache (why do this on a port reset and not on seeing a PADI, I have no idea, but bugs are funny like that).
When this was one modem and the issue was fixed, that was fine. Now it seems to be more modems, and not fixed, it needs some work arounds. So today I have added an Ethernet port reset feature to our FireBricks so that the port connected to the modem is reset for a second when PPPoE shuts down.
It is a bodge to work around someone else's bug, but it is a pragmatic step. It is in the latest FireBrick alpha release for testing.
What fun?!?
P.S. Until now people have used the profile feature of the FireBrick to reset the port when needed because a VPN would not come up due to this modem bug. This is just how flexible the FireBrick is, but it was not easy to do for VoIP related issues, and so I felt it was time for some special code for this.
Subscribe to:
Post Comments (Atom)
Fencing
Bit of fun... We usually put up some Christmas lights on the house - some fairy lights on the metal fencing at the front, but a pain as mean...
-
Broadband services are a wonderful innovation of our time, using multiple frequency bands (hence the name) to carry signals over wires (us...
-
For many years I used a small stand-alone air-conditioning unit in my study (the box room in the house) and I even had a hole in the wall fo...
-
It seems there is something of a standard test string for anti virus ( wikipedia has more on this). The idea is that systems that look fo...
I've not ran into this on the white BT modems (half a dozen in service) nor the BT Business Hub 5 or above (dozen + in service). All in bridge mode. Did BT fix their modems? I do have one Zyxel modem in service on Warwicknet; interestingly I have had on one occasion a fault where an IPSEC vpn died and would not reconnect, however a restart of libreswan fixed it so not clear if I hit this bug?
ReplyDeleteAFAIk BT fixed theirs. A break in the usage for a while can cause the cache to be cleared / replaced, so stopping an IPsec link and restarting later may do it.
DeleteI do ponder how I could ensure the modem got its update though... Nowadays I'm buying these modems on eBay primarilly, new in box, bridging them and putting them into service. Presumably that means they are not getting updated? I dont see any option to perform manual software updates...
DeleteLooks like this was seen on Zyxel modems starting at least a few years back
ReplyDelete"PPPoE Session-ID caching bug (In Bridge mode)"
https://support.aa.net.uk/VMG1312-B10A:_Bugs
Yeh, we are seeing more cases definitely confirmed and different models as well now.
DeleteAnything that low-level is more likely to have been written by the chip vendor. At least when I worked on dsl modems a few years ago, the likes of zyxel and BT didn't do much more than the UI on CPEs.
DeleteGot hit by that one in a major way a couple of years back. I've not used a zyxel since - I trawl ebay for BT modems as required.
DeleteWhat do ZyXEL have to say about it?
ReplyDeleteThey still refuse to add support for RFC4638 despite it just being a couple lines of code that need changed (the exact same changes on most devices).
https://github.com/Olipro/VMG1312-B10A/releases
And here's the RFC4638 jumbo frames in bridge mode patch for the VMG3925-B10B:
Deletehttps://github.com/trejan/VMG3925-B10B
YMMV but it works fine on my modems, a FB2900 (with the Ethernet port reset feature mentioned above enabled) and bonded AAISP Soho::1 lines.