Our IPsec VPN connection to an Amazon cloud server is always on and for the most part is very solid. Occasionally (about once a month) it will lose connection though. We have a monitor system that alerts us when this happens so we can get right on it. When the connection is lost, simply clicking the RESTART BUTTON (circling arrow icon) under Connection Status and Control is enough to reconnect it. So I set up Dead Peer Detection with the action “Restart”, Timeout of “15” and Delay of “5”. It does not work. When our connection is lost, it stays lost for hours until I manually click the RESTART BUTTON as mentioned above.
Anyone? My understanding is that Dead Peer Detection’s purpose is exactly this…to automate the vpn connection “restart” process in the event the connection is lost. Is this accurate? Does it work for anyone else?
hm, the dead peer action is working fine for a bunch of IPsec N2N connections here.
Could you please post a screenshot of the configured IPsec connection (feel free to redact any sensitive information) and the corresponding log messages in /var/log/messages? The latter start with the charon prefix, by the way.
@pmueller
Sorry, the last time this happened, I was leaving on vacation, and did not have the time it would take to redact.
Actually, this morning the IPSec connection again disconnected and would not reconnect on its own. All it took was for me to click the Restart button next to the connection and it was instantly back up. I will start posting logs and screenshots now:
Good morning Gentlemen
I have been receiving the same VPN disconnects across 3 N2N VPN connections for some time now and have been working hard to isolate the cause.
The Structure is Site 1 > Site 2, Site 1 > Site 3, etc.
The 1 > 2 N2N drops for 2 minutes every 40-45 minutes. It restarts after 2 minutes.
The same happens with 1 > 3 but not concurrently. the timing of the fail is basically the same timing.
These N2N VPN connections have been in place and trouble free for over 4 years. The trouble started in March when I updated to v166 from v164 on my return from overseas.
Figuring it may be an update problem I reinstalled v164 on 2 ipFires ( 2 & 3 ) but the problem remains.
I have checked the NTP services and times on each system.
Any suggestions would be appreciated.
Cheers
Gary Stevens
This sounds like a different issue, as my connect NEVER restarts on its own. It requires me to click the restart button in the GUI. Maybe open a differnt topic?
this is the root cause: The remote end actively closes the IPsec connection for some reason.
It is a bit confusing at first, but dead peer detection does not cover such incidents, as it is only designed to detect when an IPsec connection that should be up is not, and then does whatever it has been configured to do. If an IPsec connection is actively terminated, DPD does not apply.
StrongSwan, the IPsec component IPFire comes with, features a closeaction parameter, defining what to do if an IPsec connection is deliberately closed. Currently, it unfortunately cannot be configured from IPFire’s web interface. However, you can place it in /etc/ipsec.user.conf, which is not being touched by the web interface, in a scheme like this:
conn MYCONNECTION [or whatever name you gave to the IPsec connection]
closeaction=restart
Run a
ipsec reload
afterwards to apply these changes. As soon as an IPsec connection is closed, strongSwan will then try to establish it again.
We are facing the same issue as Tim.(or at least with very equal consequences).
In fact, the workaround from Peter works. But I’d prefer do solve the interruption of the connection since our users might face disconnection errors in various styles if the connection disappeared (broken large downloads, other errors in internal applications, etc.).
After looking into our logs, I recognized:
it seems that the disconnect begins with our headquarters’ ipfire which start creating rekey job for CHILD_SA
the log of our ipfire in the subsidiary location (configured to always start connection) and the headquarter’s ipfire (configured for incoming connection) contains several duplicate entries:
Duplicate log lines in subsidiary’s ipfire
Dec 21 16:08:06 cmexternal01 charon: 06[IKE] inbound CHILD_SA cminternal2psk{3} established with SPIs c344a556_i c36862c8_o and TS 192.168.204.0/24 === 10.121.100.0/24 192.168.0.0/19
Dec 21 16:08:06 cmexternal01 charon: 06[IKE] inbound CHILD_SA cminternal2psk{3} established with SPIs c344a556_i c36862c8_o and TS 192.168.204.0/24 === 10.121.100.0/24 192.168.0.0/19
Dec 21 16:08:06 cmexternal01 charon: 10[IKE] closing CHILD_SA cminternal2psk{1} with SPIs ca569602_i (637157 bytes) c7c38828_o (318135 bytes) and TS 192.168.204.0/24 === 10.121.100.0/24 192.168.0.0/19
Dec 21 16:08:06 cmexternal01 charon: 10[IKE] closing CHILD_SA cminternal2psk{1} with SPIs ca569602_i (637157 bytes) c7c38828_o (318135 bytes) and TS 192.168.204.0/24 === 10.121.100.0/24 192.168.0.0/19
Dec 21 16:08:06 cmexternal01 charon: 10[IKE] outbound CHILD_SA cminternal2psk{3} established with SPIs c344a556_i c36862c8_o and TS 192.168.204.0/24 === 10.121.100.0/24 192.168.0.0/19
Dec 21 16:08:06 cmexternal01 charon: 10[IKE] outbound CHILD_SA cminternal2psk{3} established with SPIs c344a556_i c36862c8_o and TS 192.168.204.0/24 === 10.121.100.0/24 192.168.0.0/19
Duplicate log lines in headquarter’s ipfire
Dec 21 16:08:06 cminternal2 charon: 12[IKE] establishing CHILD_SA cmexternal01psk{1647} reqid 1
Dec 21 16:08:06 cminternal2 charon: 12[IKE] establishing CHILD_SA cmexternal01psk{1647} reqid 1
Dec 21 16:08:06 cminternal2 charon: 08[IKE] inbound CHILD_SA cmexternal01psk{1647} established with SPIs c36862c8_i c344a556_o and TS 10.121.100.0/24 192.168.0.0/19 === 192.168.204.0/24
Dec 21 16:08:06 cminternal2 charon: 08[IKE] inbound CHILD_SA cmexternal01psk{1647} established with SPIs c36862c8_i c344a556_o and TS 10.121.100.0/24 192.168.0.0/19 === 192.168.204.0/24
Dec 21 16:08:06 cminternal2 charon: 08[IKE] outbound CHILD_SA cmexternal01psk{1647} established with SPIs c36862c8_i c344a556_o and TS 10.121.100.0/24 192.168.0.0/19 === 192.168.204.0/24
Dec 21 16:08:06 cminternal2 charon: 08[IKE] outbound CHILD_SA cmexternal01psk{1647} established with SPIs c36862c8_i c344a556_o and TS 10.121.100.0/24 192.168.0.0/19 === 192.168.204.0/24
Dec 21 16:08:06 cminternal2 charon: 08[IKE] closing CHILD_SA cmexternal01psk{1645} with SPIs c7c38828_i (318135 bytes) ca569602_o (638595 bytes) and TS 10.121.100.0/24 192.168.0.0/19 === 192.168.204.0/24
Dec 21 16:08:06 cminternal2 charon: 08[IKE] closing CHILD_SA cmexternal01psk{1645} with SPIs c7c38828_i (318135 bytes) ca569602_o (638595 bytes) and TS 10.121.100.0/24 192.168.0.0/19 === 192.168.204.0/24
A first theory
Point 1 might be intended for security purposes.
Point 2 might be the result of a script or configuration file running the same command twice instead of once, followed by closing an unused channel with equal name for 2 different channels?
Environment
The ipfire at our headquarters location stayed untouched
The ipfire at our subsidiary location was upgraded
The environment was migrated from 32bit (last available version (=v169 if I remember correctly)) to 64bit (v171) this week
Configuration has been transferred with backup/restore
rrd files have been removed, sensor data refreshed (as written in ipfire guideline)
Log of headquarters ipfire
tail /var/log/messages -f -n 39000 | grep "Dec 21 16:0" | grep "charon"
Dec 21 16:08:06 cminternal2 charon: 12[KNL] creating rekey job for CHILD_SA ESP/0xca569602/79.244.65.220
Dec 21 16:08:06 cminternal2 charon: 12[IKE] establishing CHILD_SA cmexternal01psk{1647} reqid 1
Dec 21 16:08:06 cminternal2 charon: 12[IKE] establishing CHILD_SA cmexternal01psk{1647} reqid 1
Dec 21 16:08:06 cminternal2 charon: 12[ENC] generating CREATE_CHILD_SA request 0 [ N(REKEY_SA) SA No KE TSi TSr ]
Dec 21 16:08:06 cminternal2 charon: 12[ENC] splitting IKE message (2973 bytes) into 3 fragments
Dec 21 16:08:06 cminternal2 charon: 12[ENC] generating CREATE_CHILD_SA request 0 [ EF(1/3) ]
Dec 21 16:08:06 cminternal2 charon: 12[ENC] generating CREATE_CHILD_SA request 0 [ EF(2/3) ]
Dec 21 16:08:06 cminternal2 charon: 12[ENC] generating CREATE_CHILD_SA request 0 [ EF(3/3) ]
Dec 21 16:08:06 cminternal2 charon: 12[NET] sending packet: from 10.121.100.1[4500] to 79.244.65.220[61636] (1248 bytes)
Dec 21 16:08:06 cminternal2 charon: 12[NET] sending packet: from 10.121.100.1[4500] to 79.244.65.220[61636] (1248 bytes)
Dec 21 16:08:06 cminternal2 charon: 12[NET] sending packet: from 10.121.100.1[4500] to 79.244.65.220[61636] (603 bytes)
Dec 21 16:08:06 cminternal2 charon: 08[NET] received packet: from 79.244.65.220[61636] to 10.121.100.1[4500] (261 bytes)
Dec 21 16:08:06 cminternal2 charon: 08[ENC] parsed CREATE_CHILD_SA response 0 [ SA No KE TSi TSr ]
Dec 21 16:08:06 cminternal2 charon: 08[CFG] selected proposal: ESP:CHACHA20_POLY1305/CURVE_448/NO_EXT_SEQ
Dec 21 16:08:06 cminternal2 charon: 08[IKE] inbound CHILD_SA cmexternal01psk{1647} established with SPIs c36862c8_i c344a556_o and TS 10.121.100.0/24 192.168.0.0/19 === 192.168.204.0/24
Dec 21 16:08:06 cminternal2 charon: 08[IKE] inbound CHILD_SA cmexternal01psk{1647} established with SPIs c36862c8_i c344a556_o and TS 10.121.100.0/24 192.168.0.0/19 === 192.168.204.0/24
Dec 21 16:08:06 cminternal2 charon: 08[IKE] outbound CHILD_SA cmexternal01psk{1647} established with SPIs c36862c8_i c344a556_o and TS 10.121.100.0/24 192.168.0.0/19 === 192.168.204.0/24
Dec 21 16:08:06 cminternal2 charon: 08[IKE] outbound CHILD_SA cmexternal01psk{1647} established with SPIs c36862c8_i c344a556_o and TS 10.121.100.0/24 192.168.0.0/19 === 192.168.204.0/24
Dec 21 16:08:06 cminternal2 charon: 08[IKE] closing CHILD_SA cmexternal01psk{1645} with SPIs c7c38828_i (318135 bytes) ca569602_o (638595 bytes) and TS 10.121.100.0/24 192.168.0.0/19 === 192.168.204.0/24
Dec 21 16:08:06 cminternal2 charon: 08[IKE] closing CHILD_SA cmexternal01psk{1645} with SPIs c7c38828_i (318135 bytes) ca569602_o (638595 bytes) and TS 10.121.100.0/24 192.168.0.0/19 === 192.168.204.0/24
Dec 21 16:08:06 cminternal2 charon: 08[IKE] sending DELETE for ESP CHILD_SA with SPI c7c38828
Dec 21 16:08:06 cminternal2 charon: 08[ENC] generating INFORMATIONAL request 1 [ D ]
Dec 21 16:08:06 cminternal2 charon: 08[NET] sending packet: from 10.121.100.1[4500] to 79.244.65.220[61636] (69 bytes)
Dec 21 16:08:06 cminternal2 charon: 10[NET] received packet: from 79.244.65.220[61636] to 10.121.100.1[4500] (69 bytes)
Dec 21 16:08:06 cminternal2 charon: 10[ENC] parsed INFORMATIONAL response 1 [ D ]
Dec 21 16:08:06 cminternal2 charon: 10[IKE] received DELETE for ESP CHILD_SA with SPI ca569602
Dec 21 16:08:06 cminternal2 charon: 10[IKE] CHILD_SA closed
Log of subsidiary ipfire
Dec 21 16:08:06 cmexternal01 charon: 08[NET] received packet: from 84.157.126.66[4500] to 192.168.178.173[4500] (1248 bytes)
Dec 21 16:08:06 cmexternal01 charon: 08[ENC] parsed CREATE_CHILD_SA request 0 [ EF(1/3) ]
Dec 21 16:08:06 cmexternal01 charon: 08[ENC] received fragment #1 of 3, waiting for complete IKE message
Dec 21 16:08:06 cmexternal01 charon: 16[NET] received packet: from 84.157.126.66[4500] to 192.168.178.173[4500] (1248 bytes)
Dec 21 16:08:06 cmexternal01 charon: 16[ENC] parsed CREATE_CHILD_SA request 0 [ EF(2/3) ]
Dec 21 16:08:06 cmexternal01 charon: 16[ENC] received fragment #2 of 3, waiting for complete IKE message
Dec 21 16:08:06 cmexternal01 charon: 06[NET] received packet: from 84.157.126.66[4500] to 192.168.178.173[4500] (603 bytes)
Dec 21 16:08:06 cmexternal01 charon: 06[ENC] parsed CREATE_CHILD_SA request 0 [ EF(3/3) ]
Dec 21 16:08:06 cmexternal01 charon: 06[ENC] received fragment #3 of 3, reassembled fragmented IKE message (2973 bytes)
Dec 21 16:08:06 cmexternal01 charon: 06[ENC] parsed CREATE_CHILD_SA request 0 [ N(REKEY_SA) SA No KE TSi TSr ]
Dec 21 16:08:06 cmexternal01 charon: 06[CFG] selected proposal: ESP:CHACHA20_POLY1305/CURVE_448/NO_EXT_SEQ
Dec 21 16:08:06 cmexternal01 charon: 06[IKE] inbound CHILD_SA cminternal2psk{3} established with SPIs c344a556_i c36862c8_o and TS 192.168.204.0/24 === 10.121.100.0/24 192.168.0.0/19
Dec 21 16:08:06 cmexternal01 charon: 06[IKE] inbound CHILD_SA cminternal2psk{3} established with SPIs c344a556_i c36862c8_o and TS 192.168.204.0/24 === 10.121.100.0/24 192.168.0.0/19
Dec 21 16:08:06 cmexternal01 charon: 06[ENC] generating CREATE_CHILD_SA response 0 [ SA No KE TSi TSr ]
Dec 21 16:08:06 cmexternal01 charon: 06[NET] sending packet: from 192.168.178.173[4500] to 84.157.126.66[4500] (261 bytes)
Dec 21 16:08:06 cmexternal01 charon: 10[NET] received packet: from 84.157.126.66[4500] to 192.168.178.173[4500] (69 bytes)
Dec 21 16:08:06 cmexternal01 charon: 10[ENC] parsed INFORMATIONAL request 1 [ D ]
Dec 21 16:08:06 cmexternal01 charon: 10[IKE] received DELETE for ESP CHILD_SA with SPI c7c38828
Dec 21 16:08:06 cmexternal01 charon: 10[IKE] closing CHILD_SA cminternal2psk{1} with SPIs ca569602_i (637157 bytes) c7c38828_o (318135 bytes) and TS 192.168.204.0/24 === 10.121.100.0/24 192.168.0.0/19
Dec 21 16:08:06 cmexternal01 charon: 10[IKE] closing CHILD_SA cminternal2psk{1} with SPIs ca569602_i (637157 bytes) c7c38828_o (318135 bytes) and TS 192.168.204.0/24 === 10.121.100.0/24 192.168.0.0/19
Dec 21 16:08:06 cmexternal01 charon: 10[IKE] sending DELETE for ESP CHILD_SA with SPI ca569602
Dec 21 16:08:06 cmexternal01 charon: 10[IKE] CHILD_SA closed
Dec 21 16:08:06 cmexternal01 charon: 10[IKE] outbound CHILD_SA cminternal2psk{3} established with SPIs c344a556_i c36862c8_o and TS 192.168.204.0/24 === 10.121.100.0/24 192.168.0.0/19
Dec 21 16:08:06 cmexternal01 charon: 10[IKE] outbound CHILD_SA cminternal2psk{3} established with SPIs c344a556_i c36862c8_o and TS 192.168.204.0/24 === 10.121.100.0/24 192.168.0.0/19
Dec 21 16:08:06 cmexternal01 charon: 10[ENC] generating INFORMATIONAL response 1 [ D ]
Dec 21 16:08:06 cmexternal01 charon: 10[NET] sending packet: from 192.168.178.173[4500] to 84.157.126.66[4500] (69 bytes)
An alternative way is to create an fcron job which executes every minute and checks for the ipsec tunnel status. Peter’s solution with closeaction=restart is not fail safe (e.g. remote partner not available, internet connection not available, etc.) That’s why I required a solution which retries to re-open the ipsec tunnel as soon as possible.
File content at /etc/fcron.minutely/force-ipsec-up.sh
/usr/sbin/ipsec status MYCONNECTIONNAME > /var/log/ipsec-monitoring.log
if grep -q "no match" /var/log/ipsec-monitoring.log; then
/usr/sbin/ipsec up MYCONNECTIONNAME
fi
and don’t forget to mark it executable with chmod +x /etc/fcron.minutely/force-ipsec-up.sh