When faster isn’t better: How per-packet load balancing throttles your critical traffic

Per-packet load balancing can slash TCP throughput by 70% in multi-path networks. Learn why per-session routing and latency controls protect performance.

Tayo Ogunseyinde
Systems Engineer
  • Read Time: 5 min
  • Published: November 2, 2025
  • Modified: May 28, 2026
  • 5 min read
  • November 2, 2025
  • May 28, 2026

Summary

Per-packet load balancing promises higher bandwidth but can devastate TCP throughput by up to 70% when path latencies diverge. Effective load balancing in SD-WAN optimization requires per-session steering for TCP flows, tight latency homogeneity across bonded links, and modern congestion algorithms like TCP BBR to maintain performance across heterogeneous paths.

  • Per-packet load balancing causes TCP to misinterpret out-of-order arrivals as congestion, triggering retransmissions and shrinking congestion windows dramatically.
  • Latency differences exceeding 50ms between bonded paths can crash TCP throughput by over 70%, crippling latency-sensitive applications.
  • Per-session load balancing preserves packet order within each flow, making it the preferred default for TCP-heavy enterprise environments.
  • Versa SD-WAN treats paths as equal only when latencies differ by less than 10%, enforcing the homogeneity TCP requires.
  • TCP BBR maintains roughly 80% throughput even with 100ms latency swings, significantly outperforming older CUBIC TCP under variable conditions.

Imagine your network as a highway. Per-packet load balancing splits your data into tiny cars and sends them down multiple lanes, promising faster speeds.

But what if some lanes have hidden potholes and traffic jams? For TCP, the protocol powering most of your critical apps, per-packet load-balancing can backfire spectacularly.

Despite its bandwidth benefits, this approach can cripple TCP throughput in cloud environments. So, how can you avoid becoming a victim of your own network’s “efficiency”?

The hidden trap: How out-of-order packets strangle TCP

TCP, the workhorse of web traffic, thrives on predictability. It assumes packets arrive in order and within a stable timeframe.

But per-packet load balancing tosses this logic out the window by routing packets across paths with mismatched latencies. When packets take different routes, they arrive out of sequence.

TCP mistakes this chaos for packet loss, slamming the brakes on data flow. High-latency paths, such as satellite links, delay acknowledgments (ACKs), tricking TCP into thinking the network is congested.

The result? Shrinking congestion windows and stalled transfers. In extreme cases, throughput plummets by 70% – a death knell for latency-sensitive apps like video calls or cloud databases.

Why your network’s “speed boost” fails TCP

Per-packet load balancing maximizes raw bandwidth but ignores TCP’s need for orderly delivery. It’s like serving a gourmet meal course-by-course but shuffling the dishes randomly.

For example, bonding a 50ms terrestrial link with a 500ms satellite path results in ACKs from the satellite arriving too late. TCP’s timers panic, triggering unnecessary retransmissions and throttling speeds to a crawl.

Not all latency differences are created equal. If the difference is less than 10ms, it’s smooth sailing.

However, a 10–50ms difference causes throughput to drop by 20–40% due to frantic retransmissions. When the difference exceeds 50ms, throughput crashes by over 70% as TCP gives up.

What you should do: SD-WAN and smarter TCP

To address these issues, ditch per-packet load balancing for TCP and embrace per-session load balancing. Per-session load balancing keeps all packets in a flow on one path, preserving order.

It’s the default for SD-WAN solutions like those offered by Versa, which steer traffic based on real-time latency checks. Steps you should take include:

  • Reserve per-packet load balancing for UDP, such as video streaming, which doesn’t care about the order.
  • Enforce latency homogeneity by bonding links with similar latency (less than 20ms difference). Versa SD-WAN , for example, treats paths as “equal” only if their latencies differ by less than 10%.
  • Upgrade your TCP stack to modern algorithms like TCP BBR, used in Versa’s TCP proxy. BBR maintains 80% throughput even with 100ms latency swings, compared to 30% for older CUBIC TCP.

The bottom line

Per-packet load balancing isn’t evil – it’s just context-sensitive. For TCP-heavy enterprises, the best approach is to default to per-session load balancing, bond links with tight latency controls (ΔRTT <20ms), and deploy SD-WAN for dynamic path selection and TCP optimizations.

By aligning load balancing strategies with protocol quirks, you can dodge the hidden pitfalls and keep your cloud apps running smoothly.

If you would like to get in touch to discuss your SD-WAN deployment, please drop us a line here!

Tayo Ogunseyinde

By Tayo Ogunseyinde

Systems Engineer

Tayo Ogunseyinde has 20 years of network engineering and design experience and supports customers and partners across EMEA on Versa SD-WAN and SASE deployments. He writes deeply technical content on networking and routing aimed at experienced network engineers and speaks on AI-driven networking at industry events.

FAQs

Per-session load balancing is a traffic distribution method that keeps all packets within a single TCP flow on one network path, preserving packet order. Unlike per-packet load balancing, which splits individual packets across multiple links, per-session approaches prevent out-of-order delivery that triggers unnecessary retransmissions and throughput degradation in TCP-dependent enterprise applications.

Per-packet load balancing distributes individual packets across multiple paths to maximize raw bandwidth, but causes out-of-order delivery that cripples TCP throughput by up to 70%. Per-session load balancing routes all packets in a flow along one path, preserving TCP's expected packet order. Per-packet methods remain suitable for UDP traffic like video streaming, where order is irrelevant.

When per-packet distribution routes TCP packets across paths with mismatched latencies, packets arrive out of sequence. TCP interprets this reordering as packet loss, shrinking its congestion window and triggering unnecessary retransmissions. High-latency paths delay acknowledgments, compounding the effect. Latency gaps exceeding 50 milliseconds between bonded links can crash TCP throughput by over 70%.

Bonding links with similar latency – ideally less than 20 milliseconds difference – prevents TCP from misinterpreting reordered packets as congestion. Latency gaps under 10 milliseconds maintain smooth throughput, while gaps of 10–50 milliseconds reduce throughput by 20–40%. Enforcing latency homogeneity preserves performance for cloud databases, video conferencing, and other latency-sensitive enterprise applications without sacrificing multi-link bandwidth.

Enterprises should evaluate three capabilities: per-session load balancing as the default for TCP traffic, real-time latency monitoring that bonds only paths within tight thresholds (ΔRTT under 20 milliseconds), and modern TCP algorithms like BBR. BBR maintains approximately 80% throughput even with 100-millisecond latency swings, compared to roughly 30% for legacy Cubic TCP stacks.

Subscribe to the Versa Blog

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Related Posts