OpenVPN net-to-net multi WAN throughput

baruch234 · 11 July 2021 13:50

We have a very special interest setup and sometimes achieve strange results. I would like to ask the community for possible explanations or hints for improvements. Maybe the discussion will also help others with similar setups.

We have 2 locations connected via 3 OpenVPN net-2-net tunnels using 2 different WAN connections (1 x cable, 1 x DSL) on one side. This is working fine so far for long time. We actually try to increase data throughput for nightly backups ~60GB/night from location A to location B. In order to use multiple connections simultaneously (load balanced), we use a multipath ISCSI connection as described below.

Unfortunately the overall data throughput seems to be unstable and I do not clearly understand the circumstances.
I made tests with several different setups:

Location A (data centre):

one powerful IPFire with red, green (192.168.16.0/24), blue (192.168.17.0/24), orange (192.168.18.0/24)
this machine has direct internet access with 1 GBit/s using a public IPv4 address
this machine acts as OpenVPN net-to-net client

Location B (home office)

one medium/small sized IPFire Duo Box with red and green (192.168.0.0/24) network
this machine is connected to the internet via an intermediate Zyxel dual WAN router which is connected to the internet “WAN1” via cable/fritzbox at 300/45 MBit/s down/upstream and “WAN2” via DSL/fritzbox at 95/35 MBit/s
public IPs reacheable via dyndns
this machine acts as OpenVPN net-to-net server

Details:

There are 3 OpenVPN net-to-net connections, called LINE G (green), B (blue) and O (orange) configured on both sides:

LINE G: loc. A (green)  <=> loc. B (green), OpenVPN subnet 10.0.51.0/24
LINE B: loc. A (blue)   <=> loc. B (green), OpenVPN subnet 10.0.52.0/24
LINE O: loc. A (orange) <=> loc. B (green), OpenVPN subnet 10.0.40.0/24

Firewall rules allow the green & blue network (loc. A) to access the green network (loc. B) and vice versa. Connections to orange (from loc. B to A) are allowed in this direction (not reverse). - This works fine so far.

A client on loc. A (connected to green & blue network) mounts an ISCSI target on a server in green (loc. B) via multipath (MPIO) connection:

4 paths from diff. green IPs (loc. A), supposed to use LINE G, connecting to one single green IP (loc. B)
4 paths from diff. blue IPs (loc. A), supposed to use LINE B, connecting to the same single green IP (loc. B)
These 8 parallel connections are each working. We made the overall experience, that increasing the number of connections (max. 32) helps to increase throughput.

We tested different combinations

LINEs G/B/O using .... between loc. A & B
                            throughput MByte/s
G        B         O        loc. B->A (up) /A->B (down)
cable    cable     cable     5,9/ 7,3
cable    cable     DSL       5,8/ 3,9
cable    DSL       DSL      10,2/ 4,5 upstream load balanced
DSL      DSL       DSL       4,3/10,0
cable    DSL       cable    10,3/13,5 upstream load balanced
DSL      cable     cable    10,3/14,8 upstream load balanced
DSL      cable     DSL      10,2/ 9,4 upstream load balanced
DSL      DSL       cable     4,3/20,0 => highest downstream rate
DSL      DSL       -/-       4,3/10,1
DSL      cable     -/-      10,3/12,7 upstream load balanced
cable    DSL       -/-      10,1/ 9,1 upstream load balanced

It seems amazing, that highest throughput rates downstream were achieved, when LINE O [loc. A (orange) to loc. B (green)] was established via cable connection, because there should be no data transferred from orange network AND firewall rules on loc. B deny incoming connections from loc. A (orange).

It seems to me that at least one of the two IPFire appliances is internally a little bit ‘confused’ and mixes up traffic between the 3 OpenVPN net-to-net tunnels (…)?

The OpenVPN net-to-net traffic charts show that in this case the whole downstream traffic uses LINE O and only the upstream traffic is equally distributed between LINE G and B.

Further remarks:

both IPFire appliances are far from reaching their CPU load limits (loc. A ~20%, loc. B ~60%)
the pure bandwidth of the cable/DSL connections was successfully evaluated with IPERF
the intermediate Zyxel dual WAN router is configured to respond to incoming OpenVPN traffic on the gateway that received the traffic, it is able to route traffic up to ~600 MBit/s
I also tested throughput using a 4th OpenVPN net-to-net connection (fantasy network 192.168.19.0/24 on loc. A to green on loc. B), this led to decreased overall throughput although the connection was effectively not used
I used a 2nd IPFire (a very small machine, APU Board) on loc. A, connected to the green & blue network with a different public IP-address. The OpenVPN net-to-net tunnel from blue (loc. A) to green (loc. B) was established using this machine => no substantial influence on overall throughput. This might indicate that the problem is located on IPFire on loc. B; unfortunately I cannot test a 2nd IPFire appliance in loc. B.
I set up a second ALIAS-IP on green (loc. B) and routed all traffic from blue (loc. A) to green (loc. B) over this IP (routes on the ISCSI target server adjusted accordingly) => no substantial influence on overall throughput

My questions are:

Why are the LINEs G & B only occipied with upstream traffic (loc. B → A) and why is LINE O only occupied with downstream traffic although there isn’t any data transfer between loc. A (orange) and loc. B (green)?
Why seems upstream traffic to be distributed on 2 lines while downstream traffic only uses one single line?
The above given test results tend to vary when both IPFires are rebooted; it seems that switching all OpenVPN tunnels on/off helps to reach the state again
Does anyone have another idea how to cumulate the bandwidth of two internet connections with onboard resources using IPFire? I don’t intend to use additional commercial solutions.

I know that IPFire itself is neither able to use multi WAN, nor is it able to distribute traffic over multiple channels simultaneously. But I think it is (should be) able to transfer traffic between different IPs over different OpenVPN tunnels simultaneously. So if we use an ISCSI client as “load balancer” with a target that is able to manage multiple connections and an intermediate dual wan router, the setup described above could work - or doesn’t it?

Any help is welcome!

baruch234 · 15 July 2021 21:40

I already expected that this post wouldn’t meet broad interest - but there could be some technicians with involvement; so I provided some data…

The past days, I made a few more tests and gave a new, cheap and quite simple dual wan router to IPFire (TP-Link ER605). Results:

upload speed is now ~10 MByte/s most of the time (DSL & cable upstream utilized simultaneously)
download speed reached ~24 MByte/s maximum, rests stable between 18 to 21 MByte/s, although the cable connection is utilized exclusively

What I learned so far is, that this more complex environment highly depends on individual components and needs experiments.

dwfallin · 31 December 2021 19:08

sorry for late reply, new to community! curious of a couple things, 1) why the multiple openvpn connections between locations - why not just 1 and then setup groups/rules for permissions? or at most, 1 per connection at loc B? and 2) why iscsi over the wan!? for efficiency sake - wouldn’t rsync be much better (specifically for backup/archive purposes)? no worries if you’ve moved on - just curious!

baruch234 · 1 January 2022 00:32

Hello dwfallin,
thanks for your reply - to answer your questions:

why the multiple openvpn connections […]?

We have redundant internet connection (cable/dsl) in the office and I’d like to cumulate the bandwidth both connections offer and profit from redundancy as well.

why iscsi over the wan?

This is a little bit tricky: at first, iSCSI offers a load balancing algorithm when transfering data over two connections. I didn’t find any other ‘simple’ solution. Second: this allows to keep the encryption key of the remote volume (encrypted with bitlocker) on the ‘client’. This means: even if the backup NAS would be stolen, data were safe, because the target machine doesn’t know about the encryption key. It’s just hosting a dumb block device.

You’re right: rsync would be much more efficient, but in this case, the target NAS needs to know the encryption key.

Happy New Year!

baruch