Page 1 of 1

An explanation to load-balancing and cloud firewalls

Posted: Mon Mar 01, 2021 10:27 am
by bobjones
Hello -
I recently wrote up a less technical explanation of my load balancing solution. I do plan on writing up a step-by-step guide but thought an explanation of how this more complex solution can benefit users. There is a similar solution Tutorial on this board by @bnhf2 but, I was already using ZeroTier and Opnsense and felt this was an easier path than OpenMPTCProuter for my situation. Both work similarly:

Alternative solution: https://wirelessjoint.com/viewtopic.php?f=21&t=1078

From my blog:

As my world changed over the last two years, I moved from a house with a wired, gigabit internet connection to a waterfront marina with WiFi. As the world changed around all of us, my datacenter’s office migrated to a private waterfront warehouse with WiFi. Both sites typically give an unstable zero to one Mbps speed at any time. My bandwidth need is only three to five Mbps for video conferences, streaming, remote desktops, and even sharing this connection among multiple people. As with many people on the edge of broadband, a combination of solutions might need to be blended to build a reliable, cost-effective solution.

One additional variable is my trawler’s ability to be in a congested city one week and a rural anchorage the following week. I wanted to share my evolution of internet solutions that eventually became an application-aware, load-balanced SD-WAN solution with a cloud firewall for performance, stability, and reliability. With centralized control and easy standardized policies across many sites, it has become easier to deploy a more complex design.

Before this, load balancing was always theoretical versus practical because I lived in neighborhoods with affordable high-speed broadband. That changed living on a trawler for large segments of the year. This article summarizes my journey to find a mobile, stable, high-speed Internet in areas with limited options and controlling monthly costs.

As with most users, I assumed the resolution to this would be to use a mainstream mobile data connection with an “unlimited plan”; this might work for some people, but I am a very high-usage user. There are four pc operating systems, two tablets, and multiple IoT devices installed on the boat that require regular updates. I stream video numerous hours a day, and a NAS has data synchronization running that can use a lower quality connection but high usage. When I previously have reached data caps on all the major networks, speeds would become unusable. These the options I explored, trying to resolve:

- Pay for more bandwidth. This option was not practical financially at 10$GB and 20-70 GBs of usage a month.

- Use WiFi and failback to 4G LTE. Since the failover is measured by a ping test, not a throughput or stability test, I was often switched to a WiFi connection that could not meet my current throughput needs. This switch would cause buffering or dropped connections until the manual intervention of disabling the WiFi. Without constantly watching what link I was using, I could also accidentally run up a high usage bill (or get capped and moved to the too-slow speed queue).

- Use a low-cost, unlimited 4G connection. With multiple plans and providers, none of them met all my needs. I selected Visible, a prepaid MVNO on the VZW network (also owned by Verizon). The consumer phone plan comes with an unlimited 5 Mbps connection that is more than enough speed to support my streaming and conferencing needs; most importantly, there are no data caps. The plan does put you at a lower priority than VZW users; the effect of this is at times of network congestion, I can completely lose my connection. Visible also backhauls its traffic in a way that introduces enough latency to be problematic to some applications; I would see a roundtrip 100ms ping test to nearly any destination. The high latency makes video conferencing (and gaming) more difficult.

When you implement multi-carrier load balancing, you can break some applications if the design is not appropriately implemented. Early in load balancing, a simple weighted round-robin was often used. A constant issue was when your IP would continuously change, applications that maintain a “session state” break or cause the need to reconnect. The next generation of load-balancing that is still being implemented on more basic devices is session-aware load balancing. The firewall keeps track of TCP sessions and keeps them on a single circuit/path. The downside of this is that traffic is randomly assigned to a circuit/path that might not be best suited for the traffic. This solution does not consider how much bandwidth is available on that path, the latency/loss, or the application.

The first stage I implemented to resolve this is application-based rules to steer traffic to the appropriate path. Creating multiple performance groups solved most issues but in no way provided a reliable, smooth failover to the subsequent Internet circuit.

Real-Time Communications --> Rules for Zoom, Teams, Webex --> Google Fi (Usage Based Bandwidth), Visible, WiFi

Low Priority --> Rules for OS updates and off-site backup --> WiFi, Visible

Default --> All other traffic --> Visible, WiFi

Implementing the rules above helped but caused an issue when the environment would change, and a path would become unavailable. This issue is caused by the NAT’s public IP address changing, causing multiple applications to crash or needing to reconnect. Since path congestion could happen hourly or more frequently, this became frustrating. To resolve this, I moved to an architecture that I have used in many mission-critical applications. The solution was multiple VPN tunnels, over diverse paths, with an aggressive routing protocol, connected to a cloud-hosted firewall for egress to the public.
Cloud Firewall Drawing.jpg
Using ZeroTier, a cross-platform SD-WAN cloud service to deploy a VPN, I was able to push out the rules needed from a central interface in minutes. Zerotier supports Linux Ethernet Bonding; LEB supports a few different binding algorithms (defined here). ZeroTier did a good job explaining the uses of each in a table:
Screen Shot 2021-02-24 at 10.46.22 AM.png
My goal is to aggregate the links to increase bandwidth on a link with constant changing speed; balance-aware provides this functionality:

“Traffic is dynamically allocated and balanced across multiple links simultaneously according to the target allocation. Options allow for packet or flow-based processing and active-flow reassignment. Flows mediated over a recently failed links will be reassigned in a manner that respects the target allocation of the bond.”
Screen Shot 2021-02-24 at 4.58.03 PM.png
The bonding of the circuit has a couple of effects that need to be considered in the application firewall rules. An example of this would be latency; if you bond a high and low latency link, the tunnel is now as latent as the worst circuit. You also have to consider the effect on where your VPN server will be; many websites try to geolocate you automatically via your IP. If you host your cloud firewall on a public VPS, you might be obstructed from streaming services. This blocklist prevents overseas folks from overriding geography-based content restrictions with VPN services.

Real-Time Communications --> Rules for Zoom, Teams, Webex --> Google Fi (Usage Based Bandwidth), Visible, WiFi *** w/ Local NAT egress to avoid latency

TV --> Rules for Youtube TV, Amazon, Netflix, etc. --> Visible, WiFi *** w/ Local NAT egress to avoid blacklists for hosts using VPN services

Default --> All other traffic --> Tunneled to cloud firewall over the aggregated Visible and public WiFI connection

This implementation in my office successfully provided a solution that has grown the bandwidth by bonding the facility WiFI and the Visible circuit. My ping test to the cloud firewall has had less than a 1% loss rate for the last week. An hourly speedtest-cli script has shown the expected results of bonding the two circuits’ speeds. Unfortunately, due to Lake Erie being frozen, I have not tested the solution underway but, the nearby highway provides plenty of congestion for testing. What is next on this project? We will see if upgrading the Visible modem to 5G affects latency and congestion issues in the areas where it is available.

This solution is more common in business or government networks, where reliability is a significant concern. As work from home has become a fundamental part of business now, I wonder how many consumer solutions will be based on this. Google Fi has already adopted a similar cloud firewall solution. My mobile phone has two constant VPN tunnels measuring the capability to switch to WiFi without breaking packet flow. The Fi solution (and the cloud firewall) encrypts all traffic passing through public WiFI networks, which is a nice additional benefit.

This solution was a fun effort at using a traditional enterprise toolset to solve a consumer issue.

Project sources

Open source firewall based on FreeBSD/pfSense: https://opnsense.org/

Closed source but has a free tier of service: https://www.zerotier.com/

Open source “Rooter” OS that is used for WiFi Client Management and 4G/5G modem: https://www.ofmodemsandmen.com/

An alternative solution if you do not like using ZeroTier using multipath-TCP: https://www.openmptcprouter.com/