This is driving me nuts.
I have changed a switch to a new HP 1820-24G, which seems quite a nice switch.
But I started seeing rx packet errors in the stats. Now, there is a fibre involved in the uplink on this, and I had moved the endpoint of that fibre to the loft and put an extension fibre patch lead in line, so naturally, seeing rx packet errors, I assume I have screwed up.
I have spent hours on this, new fibre patch leads, cleaning fibre ends, and so on. No joy. Still packet errors. I am coming to the conclusion the switch is lying to me, especially as, finally, it is getting to the point that it shows 1/3 of all Rx packets are in error, and that would be visible in other ways. Pings are clean, and no signs of issues with traffic apart from the reported errors.
But it is not as simple as Rx packet errors.
First off, I did not have a lot on the switch - an "uplink" on port 24, and a "downlink" on port 22 going to the switch in the loft and APs and a load of other stuff in the house. What was especially odd is that the error count for Rx errors on both (port 24 and 22) stayed the same!
So, I moved the uplink from 24 to 23. The count on 23 started going up but the total of 23+24 was the same as the Rx errors on 22!!!
I did the same the other way, moving 22 downlink to 21, but again the total Rx errors from the downlink port was the same as the Rx errors from the uplink.
This really made no sense, and the error rates were low. If I disconnected the downlink I did not see any errors from the uplink. I had an AP and a laptop connected on another port so could confirm all was working. It seemed to matter for the Rx count on the uplink as to whether the downlink was connected.
I spent ages checking and re-crimping cat5 cables, and cleaning the fibres, and changing patch leads and so on - no luck.
Eventually I decided it was clearly the switch being silly, and went on to the other job - reconnecting my neighbour on port 1 on the switch!. This involved a lot of messing about drilling holes and James crawling around in the loft to see where I was poking it through and so on. Eventually I connected port 1.
Now things changed, the uplink Rx errors went through the roof, but the downlink did not - it was still low, and was no longer the same as the uplink. Port 1 showed no Rx errors. But if I disconnect port 1 then the uplink Rx errors go back as before, quite low. If I disconnect port 1 and 22, then the uplink errors stop completely.
I have to say WTF?
Update: If I send packets with no VLAN tag that are 1500 byte payload (so 1518 total) then no errors. If I send packets with a VLAN tag that are 1500 byte payload (so 1522 total) then errors count up. This is even when the switch is set to allow jumbo frames. A clue is that if I set not to jumbo it says the MTU is 1518 on all ports, not 1522. It is clearly a bug in the switch.
Subscribe to:
Post Comments (Atom)
Deliveries from China
I have PCBs made in China (well Hong Kong). This is all my many small PCB projects (not FireBrick). I would rather use UK suppliers but I am...
-
Broadband services are a wonderful innovation of our time, using multiple frequency bands (hence the name) to carry signals over wires (us...
-
For many years I used a small stand-alone air-conditioning unit in my study (the box room in the house) and I even had a hole in the wall fo...
-
It seems there is something of a standard test string for anti virus ( wikipedia has more on this). The idea is that systems that look fo...
If the error-ed packet is not dropped by the switch but forwarded then surely it's an Error on the RX on one port and TX on the other?
ReplyDeleteOut of interest have you tried another SFP module on the fibre side?
Latest firmware...
ReplyDeleteAre you sending in any VLAN tagged packets - if the switch isn't doing the right thing there the ethernet checksums might be failing and being counted as errors?
ReplyDeleteAny chance these are jumbo packets?
ReplyDelete1522 is not "jumbo" when using a VLAN tag, *and* the switch was set to allow jumbo packets anyway.
Deleteoff-topic: How do you manage a switch in the loft.. I want to put one up in our loft but the loft gets so hot during the summer I can't imagine it will last very long before it or a bad switch mode power supply craps out.
ReplyDeleteIt is a fanless switch. The new one claims to be using 6W. Not had an issue with overheating (yet).
DeleteIf it's a software fault, I'd raise a case with HP support. Certainly the last time I dealt with them on a bug, they were actually pretty good and did release a fix.
ReplyDeleteIt sounds an awful lot like a bug we encountered with some ProCurve 2848 switches - http://support.hp.com/gb-en/document/c02597240
ReplyDeleteKeep us updated. I'm in the market for a new small gigabit switch so will be following with interest.
ReplyDeleteWhere does the uplink go to?
ReplyDeleteSee the PS, it is a red herring.
DeleteBTW, does this switch handle 9k packets?
ReplyDeleteYes, but counts 1522 as an ex error and forwards packet anyway!
DeleteAny resolution for this issue as I am seeing the same behaviour on 2x new 1820-24G (J9980A) switches?
ReplyDeleteSame here. 2x 1820-24G and same behaviour. But in my case I'd say they stalling my network.
ReplyDeleteWe used to have a couple of old cisco 2960 10/100 switches and never had an issue. But now with both these HP switches it seems as if the network is very slow. Pings work fine in <1ms and big file transfers don't seem to be a problem. But accessing http services from the desktops to the servers are wayyyy too slow and I would bet it's got something to do with the switches...
Cheers
Count me in. Two 1820-8G switches here. All VLAN tagged ports have errors. I have an 1810-8G as well, and it has no problems. This error degrades the network. I get picture breakup when streaming media through the VLAN. No breakup over regular LAN. I filed a case with HP, but I may just end up returning these switches and going with another brand.
ReplyDeleteMy issues was the counters being wrong not actually error packets.
DeleteSame here. Opened a case with HP HK and they replied the switch is designed to work this way. :(
ReplyDeleteI am now setting all devices to work at MTU 1496 to prevent those "error" packets.
It looks as if this is fixed in PT.01.14
ReplyDelete