Maximum transmission unit and maximum segment size
Magic Transit has operation requirements that customers should know about to make sure their network works as intended. Customers should pay particular attention to the maximum transmission unit (MTU) and maximum segment size (MSS) values. The incorrect configuration of these values might lead to loss of performance or inability to deliver data packets.
The maximum transmission unit (or MTU) ↗ is a measurement representing the largest data packet that a network-connected device will accept. The MTU almost always applies to Layer 3 of the OSI model in networking, and includes the entire packet, including all headers (TCP, IP, etc.) and the data (payload) itself. For example, packets must be no bigger than 1,500 bytes to be routable through the Internet.
The maximum segment size (or MSS) ↗ refers to the amount of data that can be sent in a single TCP datagram packet. This value is determined by subtracting the size of the IP and TCP headers from the MTU, which will instruct the router how large the payload can be. It applies to Layer 4 of the OSI model in networking.
One common misconception about MSS/MTU is that setting these values negatively impacts performance. While there is a slight performance penalty, it is worse not to configure these values to account for the specificities of your network.
Since Magic Transit uses encapsulation to deliver its services, it is also important to understand why MTU and MSS matter in this case.
Encapsulation adds bytes to the packet, since we add a new IP header and (often) some sort of encapsulating header to every packet. For example, in the case of GRE for IPv4, we add 24 bytes – 20 bytes for the IPv4 header, and 4 bytes for the GRE tunnel header.
A network interface which performs GRE encapsulation needs to account for the added overhead by reducing its MTU. Since the MTU maximum size is 1,500 bytes, for IPv4 this means that the MTU can be 1,476 bytes (the original 1,500 bytes minus the 24 bytes from the GRE encapsulation). This reduced MTU defines the maximum size of the IP packet that can be encapsulated by GRE.
If the data packet is larger than what the network interface can accept, it either needs to be dropped or fragmented into smaller packets. When fragmentation occurs, Cloudflare only accepts data packets that can be completely reassembled. If some fragments are missing, all received fragments are discarded. Cloudflare does not forward incomplete packets to the customer.
Setting the do not fragment
(DF) bit in the TCP header to 1
denotes that the packet must be dropped rather than fragmented if it is larger than the MTU that intermediary network devices can accept. Most TCP implementations set the do not fragment
(DF) bit to 1
to avoid the potential issues caused by fragmentation.
If you are experiencing issues with fragmentation and are unable to set an MSS clamp, Cloudflare can clear the do not fragment
(DF) bit for you. When this option is enabled, Cloudflare fragments packets greater than 1,500 bytes, and the packets are reassembled on your infrastructure after decapsulation. This should be a last resort option. Contact your account team for more information.
Fragmentation in Magic Transit
Consider a UDP datagram of size 3,000 bytes (8 bytes for the UDP header + 2,992 bytes for the UDP data). To fit within a standard 1,500 bytes MTU, this UDP datagram would be fragmented across three IP packets as follows:
Suppose that the UDP datagram has source port 389
and is destined for a Magic Transit customer IP address. Let us also suppose that the Magic Transit customer has a firewall rule in place that drops UDP traffic with source port 389
, a common Connectionless Lightweight Directory Access Protocol (CLDAP) ↗ reflection attack vector. The three packet fragments above will arrive at Cloudflare, but only the first fragment contains a UDP header with source port information — the second and third fragments, while they contain UDP data, do not have UDP header information. So the question is: which of these fragments does Cloudflare drop and which are delivered to the customer? If we only drop the first parts of fragmented packets, the remaining parts could still generate a large amount of traffic during a DoS attack.
The diagram below shows how the three UDP fragments in the example above flow through Cloudflare and Magic Transit. However, the main takeaways are:
- Cloudflare will never send incomplete packets to a customer: If we do not see all parts of a packet required to fully reassemble that packet, we will not send the partial data fragments on to the customer.
- Magic Firewall operates on fully reassembled packets, not individual fragments: This means that filters that match on UDP/TCP header information, for example, will apply to the fully reassembled packet, not just the initial fragment. So, we will not leak non-initial fragments to customers.
- Customers can still see fragmented packets: By default (without
clear_dont_fragment_bit
set), we fragment packets to fit within the configured MTU of the tunnel before sending the data to the customer. If a packet is larger than 1,476 bytes, we will fragment it and send those fragments to the customer for reassembly.
In all cases, Cloudflare sends all fragments to the customer.
Maximum segment size (MSS) is a TCP setting that limits the size of TCP segments. This option is set in the SYN packets during the three-way handshake.
By default, a TCP endpoint will set its MSS value based on its local network interface MTU. For example, for IPv4, if the MTU is 1,500 bytes then MSS will be 1,460 bytes (1,500 bytes minus 20 bytes from the IPv4 header minus 20 bytes from the TCP header).
MSS is a tool that can be used to configure TCP packet size behavior. If a TCP endpoint is known to be behind a network with reduced MTU, changing the MSS value to match the actual path MTU value will force remote endpoints to send packets that fit within the specified MTU. So, if an IPv4 TCP endpoint is known to be behind a GRE tunnel with an MTU of 1,476 bytes, the MSS value in its TCP SYN packets should be 1,436 bytes — 1,476 bytes minus the 20 bytes from the IPv4 header, minus the 20 bytes from the TCP header.
One way to modify the MSS setting is by changing the MTU of the network interface in the router’s WAN interface to match the path MTU. Another way to modify MSS is by applying a MSS clamp, where an intermediary network device — such as a router — is configured to modify the MSS TCP option on-the-fly when packets pass through it. Note that changing the MTU on the interface of an intermediary network device is not the same as applying an MSS clamp, and it will not cause the TCP MSS value to be changed.
Refer to MSS clamping recommendations for information on what you should set your MSS clamping to, depending on the type of tunnel.
Asymmetric routing is a common scenario especially with Magic Transit. Ingress traffic from the Internet enters the Cloudflare network, then traverses a GRE tunnel (MTU of 1,476 bytes), and egress traffic from the datacenter is sent via Direct Server Return (DSR) over the Internet (MTU of 1,500 bytes).
In an asymmetric scenario, we want to reduce the MSS value of packets sent by Magic Transit users to the Internet in order to reduce the size of packets sent from the Internet towards their network. To accomplish this, the configuration must be done either on the customer’s end-hosts or through an MSS clamp on an intermediary device on the egress path of traffic leaving their network. How MSS values affect payload sizes on both routing paths is detailed below.
Key takeaway from the chart above: MSS clamping affects TCP packet payload sizes flowing in the opposite direction vs. where the clamp is applied.
MSS clamping only affects TCP traffic. If, for example, you have a web server on your Magic Transit prefix, then the MSS clamp will take effect on the TCP data from direct server return traffic. However, be aware that you will have to take a different approach for any tunnels inside of your Magic Transit tunnel (tunnel-in-tunnel scenario).
For example, if you have a Magic Transit GRE tunnel set up, and then another IPsec or GRE tunnel running from third-party devices on your premises, MSS clamp will have no impact on the outer packets of the encapsulated traffic. This is because MSS clamping affects only TCP traffic, and IPsec/GRE encapsulated traffic is IP. For this scenario, you will have to lower the MTU of the internal tunnel interface further, both for your ingress and egress traffic.
The MSS value depends on how your network is set up.
-
Magic Transit ingress-only traffic (DSR):
- On your edge router transit ports: Apply a TCP MSS clamp with a maximum of 1,436 bytes.
- On any IPsec/GRE tunnels with third parties on your Magic Transit prefix: Apply the MSS clamp on the internal tunnel interface (most likely on a separate firewall behind the GRE-terminating router) to reduce the current value by 24 bytes.
-
For Magic Transit ingress + egress traffic:
- On the Magic Transit GRE tunnel internal interface: Meaning where the Magit Transit egress traffic will traverse. This may be done automatically once the tunnel is configured but it depends on your devices. The TCP MSS clamp should be 1,436 bytes maximum.
- On any IPsec/GRE tunnels with third parties on your Magic Transit prefix: On the internal tunnel interface (most likely on a separate firewall behind the GRE-terminating router) to reduce its current value by 24 bytes.
For IPsec tunnels, the value you need to specify depends on how your network is set up. The MSS clamping value will be lower than for GRE tunnels, however, since the physical interface will see IPsec-encrypted packets, not TCP packets, and MSS clamping will not apply to those.
-
Magic Transit ingress-only traffic (DSR):
- On your edge router transit ports: TCP MSS clamp should be 1,360 bytes maximum.
- On any IPsec/GRE tunnels with third parties on your Magic Transit prefix: on the internal tunnel interface (most likely on a separate firewall behind the GRE-terminating router) to reduce its current value by 140 bytes.
-
Magic Transit ingress + egress traffic:
- On your edge router: Apply this on your Magic Transit IPsec tunnel internal interface (that is, where the Magic Transit egress traffic will traverse). This may be done automatically once the tunnel is configured but it depends on your devices. TCP MSS clamp should be 1,360 bytes maximum.
- On any IPsec/GRE tunnels with third parties on your Magic Transit prefix: on the internal tunnel interface (most likely on a separate firewall behind the IPsec-terminating device in your premises) to reduce its current value by 140 bytes.