Final packet seems to be dropped or delays

In the past 6 months, I’ve noticed that downloads tend to hang at 99% completed. I particularly notice this on linux clients doing both apt update or and dist-upgrade. The same thing happens in updatexlrator where many packages stay at 99% for a few minutes. I would guess this happens on 1/3 of the files.

We have slow bandwidth here (satellite, 2Mbps download) with many devices (50+), so performance is always slow, but something definitely seems to be affecting the final packet(s) of a transfer. I haven’t found any clues in log files, iptables etc.

It seems to have started around the time that suricata was added to ipfire, but suricata is not enabled. I verified that it is still happening today (core138). I’m not sure how to trouble-shoot this. I’m going to look into tshark.

Yes, tshark is a good idea. Please try to capture a trace of a connection and then we can see…

I have made a capture (simultaneous on red/green/host) but have used wireshark very little, so I haven’t made much progress in analysing the information. I also recorded a 7 minute screen-capture video of the captured apt-update, where packages 11, 36, 40 and 41 all show a delay on the last packet.

Since I can’t attach here (Sorry, new users can not upload attachments), I uploaded everything to google drive. https://drive.google.com/drive/folders/1-6cgybKT3l_8iNLj3cq0uf_tVifbNNXn?usp=sharing

The one potential connection to this issue can be seen by filtering for ip.addr==104.20.94.114, where the host keeps resending a ?final ack? which is received on green, but never issued on red. See 104.20.94.114_finalAckUNSENT.7z.

A similar situation is seen with ip.addr==91.189.95.83. (I cut off the host file too early to see the final packets in that capture). I notice that after 2 minutes an ack is sent to ipfire and and response comes back, but nothing is seen on RED. Of course packets MIGHT be missing from the capture… see 91.189.95.83_respondLateWithoutRed.7z.

I haven’t uploaded the full red capture since it is large, and providing it might be bad security-wise. Contact me if a copy of that would be valuable.

P.S. Here are my capture filters.
◦ host 1.2.3.4 and not broadcast and not multicast
◦ -i green0 -f “host 1.2.3.4 and not broadcast” -W n -w /root/green1.pcapng
◦ -i red0 -W n -w /root/red1.pcapng

Is this with the IPS enabled or disabled?

[Intrusion Prevention System] STOPPED

Okay, I had a look at the dumps. I think they show pretty much what is going wrong here.

However, I am not sure where I would start looking. It is odd to lose a packet.

Could you (to rule out any problems with the drivers of your NICs) disable GSO and other offloading features for all of them?

Do you use the QoS?

Could you try flushing the BADTCP chain?

These are a couple of ideas. I cannot reproduce this in my environments, and if I could, I am sure I would have noticed this earlier.

Quality of Service: STOPPED
ethtool -K red0 gso off == Cannot get device udp-fragmentation-offload settings: Operation not supported

iptables -L BADTCP

Chain BADTCP (2 references)
target prot opt source destination
RETURN all – anywhere anywhere
PSCAN tcp – anywhere anywhere tcp flags:FIN,SYN,RST,PSH,ACK,URG/FIN,PSH,URG
PSCAN tcp – anywhere anywhere tcp flags:FIN,SYN,RST,PSH,ACK,URG/FIN,SYN,RST,ACK,URG
PSCAN tcp – anywhere anywhere tcp flags:FIN,SYN,RST,PSH,ACK,URG/FIN,SYN,RST,PSH,ACK,URG
PSCAN tcp – anywhere anywhere tcp flags:FIN,SYN,RST,PSH,ACK,URG/FIN
PSCAN tcp – anywhere anywhere tcp flags:SYN,RST/SYN,RST
PSCAN tcp – anywhere anywhere tcp flags:FIN,SYN/FIN,SYN
PSCAN tcp – anywhere anywhere tcp flags:FIN,SYN,RST,PSH,ACK,URG/NONE
NEWNOTSYN tcp – anywhere anywhere tcp flags:!FIN,SYN,RST,ACK/SYN ctstate NEW

Flushing that (iptables -F BADTCP) didn’t prevent the problem. I observed last-packet-hangs several times afterwards.

Okay. Thanks for testing.

What NICs are those?

RED : “pci: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)”

I did a lot of updating during the slow Christmas season and didn’t notice this problem at that time. So I’ll just assume it is a visible symptom of a very under-provisioned bandwidth problem.

(Solution doesn’t seem to be the best description, but I don’t see any other way to “close” this issue.)

1 Like