The hidden cost of QUIC and TOU

0
2

Application specific UDP-based protocols have always been around,
but with traffic volumes that are largely rounding errors. Recently the
idea of using UDP has become a lot more respectable.
IETF has started the ball rolling on standardizing
QUIC
, Google’s UDP-based combination of TCP+TLS+HTTP/2. And
Facebook published Linux kernel patches to add an encrypted UDP
encapsulation of TCP, TOU (Transports
over UDP)
. On a very high level, the approaches are dramatically
different.

QUIC is a totally new design that can really experiment on the
protocol level, but requires implementation to start from
scratch. Some of the new features are compelling (e.g. proper
multiplexing of multiple data streams), a few I have my doubts
on (e.g. forward error correction). TOU is a conservative evolution,
and pretty much just includes one actual new feature. But it can
fully leverage the host TCP stack on the server. The client would
still require a user space TCP stack and user space TOU
encapsulation.

But despite the difference in designs, the goals are very
similar. Both proposals attempt to speed up protocol evolution by
decoupling the protocol from the client OS, and moving it to the
application. (The companies that designed these protocols happen to
control the servers and the client application program, but not
really the client OS). They’d also both add support for connection
migration in a way that should more deployable than multipath
TCP. It’s hard to argue against either of these ideas.

And then there’s the third big commonality. Both proposals encrypt
and authenticate the layer 4 headers. This is the bit that I’m
uneasy about.

The recent movement to get all traffic encrypted has of course been
great for the Internet. But the use of encryption in these protocols
is different than in TLS. In TLS, the goal was to ensure the privacy
and integrity of the payload. It’s almost axiomatic that third
parties should not be able to read or modify the web page you’re
loading over HTTPS. QUIC and TOU go further. They encrypt the
control information, not just the payload. This provides no meaningful
privacy or security benefits.

Instead the apparent goal is to break the back of middleboxes 0].
The idea is that TCP can’t evolve due to middleboxes and is pretty
much fully ossified. They interfere with connections in all kinds of
ways, like stripping away unknown TCP options or dropping packets
with unknown TCP options or with specific rare TCP flags set. The
possibilities for breakage are endless, and any protocol extensions
have to jump through a lot of hoops to try to minimize the damage.

It’s almost an extension of the end-to-end principle. Not only
should protocols be defined such that functionality that can’t be
implemented correctly in the network is defined in the application.
Protocols should in addition be defined such that it’s not possible
for the network to know anything about the traffic, lest somebody
try to add any features at that level. Dumb pipes all the way!

It’s a compelling story. I’m even pretty sympathetic to it, since in
my line of work I see a lot of cases where obsolete or badly
configured middleboxes cause major performance degradation. (See
this HN
comment
for an example).

But let’s take the recent findings about the deployability of TCP
Fast Open
as an example. The headline number is absolutely
horrific: 20% failure rate! But actually that appears to be 20%
where TCP Fast Open can’t be successfully negotiated, not 20%
where connections fail. And this is for the absolute worst case;
it’s not just new TCP options, but effectively modifies the TCP
state machine for the handshake. I’ve implemented a bunch of TCP
extensions over the years. TCP Fast Open was by far the hardest
to get right.

Compared to the reported 8% failure rates to negotiate a QUIC
connection, that number looks totally reasonable. (In both cases
there is a fallback to negotiate a different type of connection, and
blacklists will be used to directly go to the fallback method the
next time around). But somehow one of these is deemed acceptable,
while the other is a sign of terminal ossification. 1].

What you lose with encrypted headers

What’s wrong with encrypted transport headers? One possible argument
is that middleboxes actually serve a critical function in the network,
and crippling them isn’t a great idea. Do
you really want a world where firewalls are unviable? But I work on
middleboxes, so of course I’d say that. (Disclaimer: these are my
own opinions, not my employer’s). So let’s ignore that. Even so,
readable headers have one killer feature: troubleshooting.

The typical network problem that my team gets to
troubleshoot is some kind of traffic either not working at
all, or working slower than it should be. So something like
the following 2]:

  • Users are complaining that Youtube videos only play in SD, but are
    choppy in HD.
  • Speedtest is showing 10Mbps on an LTE connection
    that should be able to do 50Mbps.
  • Large FTP transfers between machines in Germany and Singapore
    are only getting speeds of 2Mbps.
  • Uploads over a satellite link are so slow that they stall and
    get terminated rather than ever finish.

To debug issues like this I start with a packet capture from the
points in the network I have access to. Most of the time that’s just
a point in the middle (e.g. a mobile operator’s core network). From
just one trace, we can determine things such as the following:

  • Determine packet loss rates (on both sides, i.e. packets lost on
    the server -> core hop, and on the core -> client hop).
  • Correlate packet loss with other events.
  • Detect packet reordering rates (on both sides).
  • Detect packet corruption rates (on both sides).
  • Determine RTTs continuously over the lifetime of a connection, not
    just during a connection handshake (e.g. to use queuing as a
    congestion signal to establish the downlink as the bottleneck).
  • Estimate sender congestion windows from observed delivery rates
    (to determine whether congestion control is the bottleneck).
  • Inspect the TCP options (e.g. window scaling, mss) and the receive
    windows to determine whether the software on the client or the server
    is the bottleneck.
  • Distinguish between pure control packets and data packets (e.g. to
    distinguish multiple separate HTTPS requests within a single TCP
    connection).
  • Detect the presence of middleboxes that are interfering with the
    connection. (But only occasionally; more often you’ll need multiple
    traces for this).

We do most this with some specialized tools. But it’s essentially no
different from opening up the trace in Wireshark, following a
connection with disappointing performance, and figuring out what
happened. That’s something that every network engineer probably
does on a regular basis.

With encrypted control information you can’t figure out any of this. The
only solid data you get is the throughput (not even the goodput). For anything
more you, need traces from multiple points in the network. Those are
hard to get, sometimes it’s even outright impossible.
And to do the analysis, you need to correlate those
multiple traces with each other. That’s a significantly higher barrier
than just opening up Wireshark.
In practice the network becomes a total black box, even to the
people who are supposed to keep it running. That’s not going to be
a great place to be in.

Conclusion

To conclude, I think encrypting the L4 headers is a step too
far. If these protocols get deployed widely enough (a distinct
possibility with standardization), the operational pain will be
significant.

There would be a reasonable middle ground where the headers are
authenticated but not encrypted. That prevents spoofing and
modifying packets, but still leaves open the possibility of
understanding what’s actually happening to the traffic.

Footnotes

Read More

This site uses Akismet to reduce spam. Learn how your comment data is processed.