OpenVPN net-to-net multi WAN throughput

We have a very special interest setup and sometimes achieve strange results. I would like to ask the community for possible explanations or hints for improvements. Maybe the discussion will also help others with similar setups.

We have 2 locations connected via 3 OpenVPN net-2-net tunnels using 2 different WAN connections (1 x cable, 1 x DSL) on one side. This is working fine so far for long time. We actually try to increase data throughput for nightly backups ~60GB/night from location A to location B. In order to use multiple connections simultaneously (load balanced), we use a multipath ISCSI connection as described below.

Unfortunately the overall data throughput seems to be unstable and I do not clearly understand the circumstances.
I made tests with several different setups:

Location A (data centre):

  • one powerful IPFire with red, green (192.168.16.0/24), blue (192.168.17.0/24), orange (192.168.18.0/24)
  • this machine has direct internet access with 1 GBit/s using a public IPv4 address
  • this machine acts as OpenVPN net-to-net client

Location B (home office)

  • one medium/small sized IPFire Duo Box with red and green (192.168.0.0/24) network
  • this machine is connected to the internet via an intermediate Zyxel dual WAN router which is connected to the internet “WAN1” via cable/fritzbox at 300/45 MBit/s down/upstream and “WAN2” via DSL/fritzbox at 95/35 MBit/s
  • public IPs reacheable via dyndns
  • this machine acts as OpenVPN net-to-net server

Details:

There are 3 OpenVPN net-to-net connections, called LINE G (green), B (blue) and O (orange) configured on both sides:

LINE G: loc. A (green)  <=> loc. B (green), OpenVPN subnet 10.0.51.0/24
LINE B: loc. A (blue)   <=> loc. B (green), OpenVPN subnet 10.0.52.0/24
LINE O: loc. A (orange) <=> loc. B (green), OpenVPN subnet 10.0.40.0/24

Firewall rules allow the green & blue network (loc. A) to access the green network (loc. B) and vice versa. Connections to orange (from loc. B to A) are allowed in this direction (not reverse). - This works fine so far.

A client on loc. A (connected to green & blue network) mounts an ISCSI target on a server in green (loc. B) via multipath (MPIO) connection:

  • 4 paths from diff. green IPs (loc. A), supposed to use LINE G, connecting to one single green IP (loc. B)
  • 4 paths from diff. blue IPs (loc. A), supposed to use LINE B, connecting to the same single green IP (loc. B)
    These 8 parallel connections are each working. We made the overall experience, that increasing the number of connections (max. 32) helps to increase throughput.

We tested different combinations

LINEs G/B/O using .... between loc. A & B
                            throughput MByte/s
G        B         O        loc. B->A (up) /A->B (down)
cable    cable     cable     5,9/ 7,3
cable    cable     DSL       5,8/ 3,9
cable    DSL       DSL      10,2/ 4,5 upstream load balanced
DSL      DSL       DSL       4,3/10,0
cable    DSL       cable    10,3/13,5 upstream load balanced
DSL      cable     cable    10,3/14,8 upstream load balanced
DSL      cable     DSL      10,2/ 9,4 upstream load balanced
DSL      DSL       cable     4,3/20,0 => highest downstream rate
DSL      DSL       -/-       4,3/10,1
DSL      cable     -/-      10,3/12,7 upstream load balanced
cable    DSL       -/-      10,1/ 9,1 upstream load balanced

It seems amazing, that highest throughput rates downstream were achieved, when LINE O [loc. A (orange) to loc. B (green)] was established via cable connection, because there should be no data transferred from orange network AND firewall rules on loc. B deny incoming connections from loc. A (orange).

It seems to me that at least one of the two IPFire appliances is internally a little bit ‘confused’ and mixes up traffic between the 3 OpenVPN net-to-net tunnels (…)?

The OpenVPN net-to-net traffic charts show that in this case the whole downstream traffic uses LINE O and only the upstream traffic is equally distributed between LINE G and B.

Further remarks:

  • both IPFire appliances are far from reaching their CPU load limits (loc. A ~20%, loc. B ~60%)
  • the pure bandwidth of the cable/DSL connections was successfully evaluated with IPERF
  • the intermediate Zyxel dual WAN router is configured to respond to incoming OpenVPN traffic on the gateway that received the traffic, it is able to route traffic up to ~600 MBit/s
  • I also tested throughput using a 4th OpenVPN net-to-net connection (fantasy network 192.168.19.0/24 on loc. A to green on loc. B), this led to decreased overall throughput although the connection was effectively not used
  • I used a 2nd IPFire (a very small machine, APU Board) on loc. A, connected to the green & blue network with a different public IP-address. The OpenVPN net-to-net tunnel from blue (loc. A) to green (loc. B) was established using this machine => no substantial influence on overall throughput. This might indicate that the problem is located on IPFire on loc. B; unfortunately I cannot test a 2nd IPFire appliance in loc. B.
  • I set up a second ALIAS-IP on green (loc. B) and routed all traffic from blue (loc. A) to green (loc. B) over this IP (routes on the ISCSI target server adjusted accordingly) => no substantial influence on overall throughput

My questions are:

  • Why are the LINEs G & B only occipied with upstream traffic (loc. B → A) and why is LINE O only occupied with downstream traffic although there isn’t any data transfer between loc. A (orange) and loc. B (green)?
  • Why seems upstream traffic to be distributed on 2 lines while downstream traffic only uses one single line?
  • The above given test results tend to vary when both IPFires are rebooted; it seems that switching all OpenVPN tunnels on/off helps to reach the state again
  • Does anyone have another idea how to cumulate the bandwidth of two internet connections with onboard resources using IPFire? I don’t intend to use additional commercial solutions.

I know that IPFire itself is neither able to use multi WAN, nor is it able to distribute traffic over multiple channels simultaneously. But I think it is (should be) able to transfer traffic between different IPs over different OpenVPN tunnels simultaneously. So if we use an ISCSI client as “load balancer” with a target that is able to manage multiple connections and an intermediate dual wan router, the setup described above could work - or doesn’t it?

Any help is welcome!

I already expected that this post wouldn’t meet broad interest - but there could be some technicians with involvement; so I provided some data…

The past days, I made a few more tests and gave a new, cheap and quite simple dual wan router to IPFire (TP-Link ER605). Results:

  • upload speed is now ~10 MByte/s most of the time (DSL & cable upstream utilized simultaneously)
  • download speed reached ~24 MByte/s maximum, rests stable between 18 to 21 MByte/s, although the cable connection is utilized exclusively

What I learned so far is, that this more complex environment highly depends on individual components and needs experiments.

sorry for late reply, new to community! curious of a couple things, 1) why the multiple openvpn connections between locations - why not just 1 and then setup groups/rules for permissions? or at most, 1 per connection at loc B? and 2) why iscsi over the wan!? for efficiency sake - wouldn’t rsync be much better (specifically for backup/archive purposes)? no worries if you’ve moved on - just curious! :slight_smile:

1 Like

Hello dwfallin,
thanks for your reply - to answer your questions:

  1. why the multiple openvpn connections […]?

We have redundant internet connection (cable/dsl) in the office and I’d like to cumulate the bandwidth both connections offer and profit from redundancy as well.

  1. why iscsi over the wan?

This is a little bit tricky: at first, iSCSI offers a load balancing algorithm when transfering data over two connections. I didn’t find any other ‘simple’ solution. Second: this allows to keep the encryption key of the remote volume (encrypted with bitlocker) on the ‘client’. This means: even if the backup NAS would be stolen, data were safe, because the target machine doesn’t know about the encryption key. It’s just hosting a dumb block device.

You’re right: rsync would be much more efficient, but in this case, the target NAS needs to know the encryption key.

Happy New Year!

baruch

1 Like