I manage two sites that are connected via an IPSEC Tunnel with IPFire routers on both sides managing the VPN. I have a Synology backup appliance on the local lan, and on the remote site. Backups work perfect with no issues from local servers to the local synology, but then the Snapshot replication to the remote site’s Synology has been spotty, and hit or miss. The snapshot replication app just seems to stall with no error messages. After fighting these issues for a year with Synology they keep coming back saying that they are seeing lots of timeout and is claiming that our network is “unstable”. I can regularly ping all switches, the local IPFire, the remote IPFire, the remote synology, the local synology without any issues. I’ve run wireshark captures and I can’t find any problems. I started running an EMCO ping monitor session on both sides to check and log network activity from both sides of the network to watch for outages. I know there have been some issues with DNS that have affected things at times, but nothing to warrant a constant struggle to get backups to transfer over as they claim. I don’t see anything of concern in /var/log/messages, The synology device on both sides are connecting by IP not by name, so I don’t think DNS is a factor.
I’m looking for ideas on what to check or what to watch for. If this was your network and your situation what would you watch for or check to validate / disprove their ‘network is unstable’ claim?
Is there any QoS rule running between your sites?
Would you please explicitate your ISP capabilities, if possible?
Are Synology devices aware of reduced bandwidth between them?
Synology devices did an initial sync next to each other with a cross over cable, since then everything is done to the offsite location.
In active backup for business the on site backup app they have sections for backing up physical servers, virtual servers pc’s and file servers. We have file servers that are virtual and up until recently I thought that I needed backups as virtual and file servers, so I can restore individual files without the whole vm, but I’ve since learned how to restore individual files from virtual machines. Their claiming they there are two factors, network instability and the sheer number of files we have. All together it’s about 30 TB, 650,000,000 files after dedup
It’s possible, I’m not ruling out anything at this point. Given that both network cards are 1 Gbps cards, and the connections attached to them are 50Mbps and 75 Mbps I doubt that there is a bottleneck on the network cards. System Components interuption might be a factor but I honestly don’t know where to look.
I really don’t know how to answer that. There is no setting in the application that I can see to ‘throttle’ the snapshot replication process to a lower bandwidth, the whole process just happens as fast as both systems are capable of running. The only settings that I have options are:
Partner Server
-Server name or IP address
-username
-password
[x] use encrypted connection
schedule
[x] enable replication schedule
Select days: Daily
Starting at: 00:30
Frequency: Every day
Repeat until (grayed out)
Timeout notification settings for the scheduled sync Waiting time (minute): 720
[ ] only send snapshots within the transfer window
grayed out
Advanced
[ ] Replicate local snapshots
[ ] Use the time in the GMT + 0 time zone to name the snapshots
Each time I start a job, the process starts out, seems go go well for a number of hours then for no particular reason, it just seems to hang, then it just looks like this.
The job will stay in this state indefinitely, until I reboot the device or restart the job. The first 8GB of this this job processed within the first few hours of the job started, then some time yesterday morning it just seems to hang. No error messages on either end. I can actively ping both devices, the network is not down, the job just stops. Any suggestions on things to change?
I am sorry, but i am not too much familiar with Synology products and apps.
A competitor (QNAP) into Hybrid Backup Sync app allow to set the bandwidth for every operation configured, if RTRR protocol is used for data transfer. This is quite useful setting for backup amongs remote site (even with a 100/10 source/14/2 destination DSL Connection). TCP BBR is the name of the option (cannot provide screenshot currently).
A fast internet search suggested to assign specific bandwidth to the user in charge of the operation, but maybe snapshot replication don’t need a user. Or the target should be correctly classified. I truly don’t know.
Don’t get me wrong: if your IPSec connection has problems, these hints won’t help you solve them. And I trust that Synology can really do the job if the configuration has done properly. But sometimes, transport is not the real problem, only one component of the failure.