Network Timeouts

I manage two sites that are connected via an IPSEC Tunnel with IPFire routers on both sides managing the VPN. I have a Synology backup appliance on the local lan, and on the remote site. Backups work perfect with no issues from local servers to the local synology, but then the Snapshot replication to the remote site’s Synology has been spotty, and hit or miss. The snapshot replication app just seems to stall with no error messages. After fighting these issues for a year with Synology they keep coming back saying that they are seeing lots of timeout and is claiming that our network is “unstable”. I can regularly ping all switches, the local IPFire, the remote IPFire, the remote synology, the local synology without any issues. I’ve run wireshark captures and I can’t find any problems. I started running an EMCO ping monitor session on both sides to check and log network activity from both sides of the network to watch for outages. I know there have been some issues with DNS that have affected things at times, but nothing to warrant a constant struggle to get backups to transfer over as they claim. I don’t see anything of concern in /var/log/messages, The synology device on both sides are connecting by IP not by name, so I don’t think DNS is a factor.

I’m looking for ideas on what to check or what to watch for. If this was your network and your situation what would you watch for or check to validate / disprove their ‘network is unstable’ claim?

Chris

Is there any QoS rule running between your sites?
Would you please explicitate your ISP capabilities, if possible?
Are Synology devices aware of reduced bandwidth between them?

If it was me I’d want Synology to provide me something to support their claims of timeouts.

I’d also be looking at network latency as a possible cause.

If the issues are comms related then there needs to be facts / data collected otherwise it can easily become an exercise in finger pointing.

Source: ATT dedicated fiber 50/50

Destination: Baldwin business fiber 75/75

No QoS rules between locations
Static IP on both sides
I asked for logs and proof of their claims.

Chris

Synology devices did an initial sync next to each other with a cross over cable, since then everything is done to the offsite location.

In active backup for business the on site backup app they have sections for backing up physical servers, virtual servers pc’s and file servers. We have file servers that are virtual and up until recently I thought that I needed backups as virtual and file servers, so I can restore individual files without the whole vm, but I’ve since learned how to restore individual files from virtual machines. Their claiming they there are two factors, network instability and the sheer number of files we have. All together it’s about 30 TB, 650,000,000 files after dedup

Hi @cwensink .

I throw in an idea that may be silly, but who knows, it might help:

  • Could it be a problem with the type of VPN (encryption / decryption) that is, the type of encryption / decryption method used?

  • The equipment has AES-NI / Entrophy (I don’t know if entropy has something to do with it) ?.

  • Network card / processor bottlenecks?.

  • System interruptions due to some component ?.

Maybe someone with more experience can shed a little more light.

Regards.

Still waiting for answer to this question, if possible.

It’s possible, I’m not ruling out anything at this point. Given that both network cards are 1 Gbps cards, and the connections attached to them are 50Mbps and 75 Mbps I doubt that there is a bottleneck on the network cards. System Components interuption might be a factor but I honestly don’t know where to look.

I really don’t know how to answer that. There is no setting in the application that I can see to ‘throttle’ the snapshot replication process to a lower bandwidth, the whole process just happens as fast as both systems are capable of running. The only settings that I have options are:

Partner Server
-Server name or IP address
-username
-password
[x] use encrypted connection

schedule
[x] enable replication schedule
Select days: Daily
Starting at: 00:30
Frequency: Every day
Repeat until (grayed out)

Timeout notification settings for the scheduled sync Waiting time (minute): 720

[ ] only send snapshots within the transfer window
grayed out

Advanced
[ ] Replicate local snapshots

[ ] Use the time in the GMT + 0 time zone to name the snapshots

Each time I start a job, the process starts out, seems go go well for a number of hours then for no particular reason, it just seems to hang, then it just looks like this.

The job will stay in this state indefinitely, until I reboot the device or restart the job. The first 8GB of this this job processed within the first few hours of the job started, then some time yesterday morning it just seems to hang. No error messages on either end. I can actively ping both devices, the network is not down, the job just stops. Any suggestions on things to change?

I am sorry, but i am not too much familiar with Synology products and apps.

A competitor (QNAP) into Hybrid Backup Sync app allow to set the bandwidth for every operation configured, if RTRR protocol is used for data transfer. This is quite useful setting for backup amongs remote site (even with a 100/10 source/14/2 destination DSL Connection). TCP BBR is the name of the option (cannot provide screenshot currently).

A fast internet search suggested to assign specific bandwidth to the user in charge of the operation, but maybe snapshot replication don’t need a user. Or the target should be correctly classified. I truly don’t know.

Don’t get me wrong: if your IPSec connection has problems, these hints won’t help you solve them. And I trust that Synology can really do the job if the configuration has done properly. But sometimes, transport is not the real problem, only one component of the failure.