Move patches to backport-5.10, since the series was accepted upstream Signed-off-by: Felix Fietkau <nbd@nbd.name>
		
			
				
	
	
		
			237 lines
		
	
	
		
			9.7 KiB
		
	
	
	
		
			Diff
		
	
	
	
	
	
			
		
		
	
	
			237 lines
		
	
	
		
			9.7 KiB
		
	
	
	
		
			Diff
		
	
	
	
	
	
From: Pablo Neira Ayuso <pablo@netfilter.org>
 | 
						|
Date: Wed, 24 Mar 2021 02:30:55 +0100
 | 
						|
Subject: [PATCH] docs: nf_flowtable: update documentation with
 | 
						|
 enhancements
 | 
						|
 | 
						|
This patch updates the flowtable documentation to describe recent
 | 
						|
enhancements:
 | 
						|
 | 
						|
- Offload action is available after the first packets go through the
 | 
						|
  classic forwarding path.
 | 
						|
- IPv4 and IPv6 are supported. Only TCP and UDP layer 4 are supported at
 | 
						|
  this stage.
 | 
						|
- Tuple has been augmented to track VLAN id and PPPoE session id.
 | 
						|
- Bridge and IP forwarding integration, including bridge VLAN filtering
 | 
						|
  support.
 | 
						|
- Hardware offload support.
 | 
						|
- Describe the [OFFLOAD] and [HW_OFFLOAD] tags in the conntrack table
 | 
						|
  listing.
 | 
						|
- Replace 'flow offload' by 'flow add' in example rulesets (preferred
 | 
						|
  syntax).
 | 
						|
- Describe existing cache limitations.
 | 
						|
 | 
						|
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
 | 
						|
---
 | 
						|
 | 
						|
--- a/Documentation/networking/nf_flowtable.rst
 | 
						|
+++ b/Documentation/networking/nf_flowtable.rst
 | 
						|
@@ -4,35 +4,38 @@
 | 
						|
 Netfilter's flowtable infrastructure
 | 
						|
 ====================================
 | 
						|
 
 | 
						|
-This documentation describes the software flowtable infrastructure available in
 | 
						|
-Netfilter since Linux kernel 4.16.
 | 
						|
+This documentation describes the Netfilter flowtable infrastructure which allows
 | 
						|
+you to define a fastpath through the flowtable datapath. This infrastructure
 | 
						|
+also provides hardware offload support. The flowtable supports for the layer 3
 | 
						|
+IPv4 and IPv6 and the layer 4 TCP and UDP protocols.
 | 
						|
 
 | 
						|
 Overview
 | 
						|
 --------
 | 
						|
 
 | 
						|
-Initial packets follow the classic forwarding path, once the flow enters the
 | 
						|
-established state according to the conntrack semantics (ie. we have seen traffic
 | 
						|
-in both directions), then you can decide to offload the flow to the flowtable
 | 
						|
-from the forward chain via the 'flow offload' action available in nftables.
 | 
						|
-
 | 
						|
-Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the
 | 
						|
-output netdevice via neigh_xmit(), hence, they bypass the classic forwarding
 | 
						|
-path (the visible effect is that you do not see these packets from any of the
 | 
						|
-netfilter hooks coming after the ingress). In case of flowtable miss, the packet
 | 
						|
-follows the classic forward path.
 | 
						|
-
 | 
						|
-The flowtable uses a resizable hashtable, lookups are based on the following
 | 
						|
-7-tuple selectors: source, destination, layer 3 and layer 4 protocols, source
 | 
						|
-and destination ports and the input interface (useful in case there are several
 | 
						|
-conntrack zones in place).
 | 
						|
-
 | 
						|
-Flowtables are populated via the 'flow offload' nftables action, so the user can
 | 
						|
-selectively specify what flows are placed into the flow table. Hence, packets
 | 
						|
-follow the classic forwarding path unless the user explicitly instruct packets
 | 
						|
-to use this new alternative forwarding path via nftables policy.
 | 
						|
+Once the first packet of the flow successfully goes through the IP forwarding
 | 
						|
+path, from the second packet on, you might decide to offload the flow to the
 | 
						|
+flowtable through your ruleset. The flowtable infrastructure provides a rule
 | 
						|
+action that allows you to specify when to add a flow to the flowtable.
 | 
						|
+
 | 
						|
+A packet that finds a matching entry in the flowtable (ie. flowtable hit) is
 | 
						|
+transmitted to the output netdevice via neigh_xmit(), hence, packets bypass the
 | 
						|
+classic IP forwarding path (the visible effect is that you do not see these
 | 
						|
+packets from any of the Netfilter hooks coming after ingress). In case that
 | 
						|
+there is no matching entry in the flowtable (ie. flowtable miss), the packet
 | 
						|
+follows the classic IP forwarding path.
 | 
						|
+
 | 
						|
+The flowtable uses a resizable hashtable. Lookups are based on the following
 | 
						|
+n-tuple selectors: layer 2 protocol encapsulation (VLAN and PPPoE), layer 3
 | 
						|
+source and destination, layer 4 source and destination ports and the input
 | 
						|
+interface (useful in case there are several conntrack zones in place).
 | 
						|
+
 | 
						|
+The 'flow add' action allows you to populate the flowtable, the user selectively
 | 
						|
+specifies what flows are placed into the flowtable. Hence, packets follow the
 | 
						|
+classic IP forwarding path unless the user explicitly instruct flows to use this
 | 
						|
+new alternative forwarding path via policy.
 | 
						|
 
 | 
						|
-This is represented in Fig.1, which describes the classic forwarding path
 | 
						|
-including the Netfilter hooks and the flowtable fastpath bypass.
 | 
						|
+The flowtable datapath is represented in Fig.1, which describes the classic IP
 | 
						|
+forwarding path including the Netfilter hooks and the flowtable fastpath bypass.
 | 
						|
 
 | 
						|
 ::
 | 
						|
 
 | 
						|
@@ -67,11 +70,13 @@ including the Netfilter hooks and the fl
 | 
						|
 	       Fig.1 Netfilter hooks and flowtable interactions
 | 
						|
 
 | 
						|
 The flowtable entry also stores the NAT configuration, so all packets are
 | 
						|
-mangled according to the NAT policy that matches the initial packets that went
 | 
						|
-through the classic forwarding path. The TTL is decremented before calling
 | 
						|
-neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding
 | 
						|
-path given that the transport selectors are missing, therefore flowtable lookup
 | 
						|
-is not possible.
 | 
						|
+mangled according to the NAT policy that is specified from the classic IP
 | 
						|
+forwarding path. The TTL is decremented before calling neigh_xmit(). Fragmented
 | 
						|
+traffic is passed up to follow the classic IP forwarding path given that the
 | 
						|
+transport header is missing, in this case, flowtable lookups are not possible.
 | 
						|
+TCP RST and FIN packets are also passed up to the classic IP forwarding path to
 | 
						|
+release the flow gracefully. Packets that exceed the MTU are also passed up to
 | 
						|
+the classic forwarding path to report packet-too-big ICMP errors to the sender.
 | 
						|
 
 | 
						|
 Example configuration
 | 
						|
 ---------------------
 | 
						|
@@ -85,7 +90,7 @@ flowtable and add one rule to your forwa
 | 
						|
 		}
 | 
						|
 		chain y {
 | 
						|
 			type filter hook forward priority 0; policy accept;
 | 
						|
-			ip protocol tcp flow offload @f
 | 
						|
+			ip protocol tcp flow add @f
 | 
						|
 			counter packets 0 bytes 0
 | 
						|
 		}
 | 
						|
 	}
 | 
						|
@@ -103,6 +108,117 @@ flow is offloaded, you will observe that
 | 
						|
 does not get updated for the packets that are being forwarded through the
 | 
						|
 forwarding bypass.
 | 
						|
 
 | 
						|
+You can identify offloaded flows through the [OFFLOAD] tag when listing your
 | 
						|
+connection tracking table.
 | 
						|
+
 | 
						|
+::
 | 
						|
+	# conntrack -L
 | 
						|
+	tcp      6 src=10.141.10.2 dst=192.168.10.2 sport=52728 dport=5201 src=192.168.10.2 dst=192.168.10.1 sport=5201 dport=52728 [OFFLOAD] mark=0 use=2
 | 
						|
+
 | 
						|
+
 | 
						|
+Layer 2 encapsulation
 | 
						|
+---------------------
 | 
						|
+
 | 
						|
+Since Linux kernel 5.13, the flowtable infrastructure discovers the real
 | 
						|
+netdevice behind VLAN and PPPoE netdevices. The flowtable software datapath
 | 
						|
+parses the VLAN and PPPoE layer 2 headers to extract the ethertype and the
 | 
						|
+VLAN ID / PPPoE session ID which are used for the flowtable lookups. The
 | 
						|
+flowtable datapath also deals with layer 2 decapsulation.
 | 
						|
+
 | 
						|
+You do not need to add the PPPoE and the VLAN devices to your flowtable,
 | 
						|
+instead the real device is sufficient for the flowtable to track your flows.
 | 
						|
+
 | 
						|
+Bridge and IP forwarding
 | 
						|
+------------------------
 | 
						|
+
 | 
						|
+Since Linux kernel 5.13, you can add bridge ports to the flowtable. The
 | 
						|
+flowtable infrastructure discovers the topology behind the bridge device. This
 | 
						|
+allows the flowtable to define a fastpath bypass between the bridge ports
 | 
						|
+(represented as eth1 and eth2 in the example figure below) and the gateway
 | 
						|
+device (represented as eth0) in your switch/router.
 | 
						|
+
 | 
						|
+::
 | 
						|
+                      fastpath bypass
 | 
						|
+               .-------------------------.
 | 
						|
+              /                           \
 | 
						|
+              |           IP forwarding   |
 | 
						|
+              |          /             \ \/
 | 
						|
+              |       br0               eth0 ..... eth0
 | 
						|
+              .       / \                          *host B*
 | 
						|
+               -> eth1  eth2
 | 
						|
+                   .           *switch/router*
 | 
						|
+                   .
 | 
						|
+                   .
 | 
						|
+                 eth0
 | 
						|
+               *host A*
 | 
						|
+
 | 
						|
+The flowtable infrastructure also supports for bridge VLAN filtering actions
 | 
						|
+such as PVID and untagged. You can also stack a classic VLAN device on top of
 | 
						|
+your bridge port.
 | 
						|
+
 | 
						|
+If you would like that your flowtable defines a fastpath between your bridge
 | 
						|
+ports and your IP forwarding path, you have to add your bridge ports (as
 | 
						|
+represented by the real netdevice) to your flowtable definition.
 | 
						|
+
 | 
						|
+Counters
 | 
						|
+--------
 | 
						|
+
 | 
						|
+The flowtable can synchronize packet and byte counters with the existing
 | 
						|
+connection tracking entry by specifying the counter statement in your flowtable
 | 
						|
+definition, e.g.
 | 
						|
+
 | 
						|
+::
 | 
						|
+	table inet x {
 | 
						|
+		flowtable f {
 | 
						|
+			hook ingress priority 0; devices = { eth0, eth1 };
 | 
						|
+			counter
 | 
						|
+		}
 | 
						|
+		...
 | 
						|
+	}
 | 
						|
+
 | 
						|
+Counter support is available since Linux kernel 5.7.
 | 
						|
+
 | 
						|
+Hardware offload
 | 
						|
+----------------
 | 
						|
+
 | 
						|
+If your network device provides hardware offload support, you can turn it on by
 | 
						|
+means of the 'offload' flag in your flowtable definition, e.g.
 | 
						|
+
 | 
						|
+::
 | 
						|
+	table inet x {
 | 
						|
+		flowtable f {
 | 
						|
+			hook ingress priority 0; devices = { eth0, eth1 };
 | 
						|
+			flags offload;
 | 
						|
+		}
 | 
						|
+		...
 | 
						|
+	}
 | 
						|
+
 | 
						|
+There is a workqueue that adds the flows to the hardware. Note that a few
 | 
						|
+packets might still run over the flowtable software path until the workqueue has
 | 
						|
+a chance to offload the flow to the network device.
 | 
						|
+
 | 
						|
+You can identify hardware offloaded flows through the [HW_OFFLOAD] tag when
 | 
						|
+listing your connection tracking table. Please, note that the [OFFLOAD] tag
 | 
						|
+refers to the software offload mode, so there is a distinction between [OFFLOAD]
 | 
						|
+which refers to the software flowtable fastpath and [HW_OFFLOAD] which refers
 | 
						|
+to the hardware offload datapath being used by the flow.
 | 
						|
+
 | 
						|
+The flowtable hardware offload infrastructure also supports for the DSA
 | 
						|
+(Distributed Switch Architecture).
 | 
						|
+
 | 
						|
+Limitations
 | 
						|
+-----------
 | 
						|
+
 | 
						|
+The flowtable behaves like a cache. The flowtable entries might get stale if
 | 
						|
+either the destination MAC address or the egress netdevice that is used for
 | 
						|
+transmission changes.
 | 
						|
+
 | 
						|
+This might be a problem if:
 | 
						|
+
 | 
						|
+- You run the flowtable in software mode and you combine bridge and IP
 | 
						|
+  forwarding in your setup.
 | 
						|
+- Hardware offload is enabled.
 | 
						|
+
 | 
						|
 More reading
 | 
						|
 ------------
 | 
						|
 
 |