esBPF: Stress-Testing compares Software-Offload with iptables
esBPF project has been over one year and it began with the idea that Is it worth filtering ingress packets on Software-Offload layer instead of Network Stack? Software-Offload is similar to Hardware-Offload, but it works in ethernet driver. Now time to do Stress-testing since its prototype was released and the comparison object will be iptables.
Before walking through the article, let me define a few short terms against typing exhausting long terms,
Long Term | Short Term |
---|---|
Raspberry Pi 3 | Rpi3 |
Host Machine | Host |
Testbed
Host
and Rpi3
are on link connection of the same LAN of the AP below that it supports HW-offload and being Bridge mode against its Kernel interrupts forwarding packets between them.
1 High-Performance AP
2 - HW-offload Supported
3 - Bridge Mode
4 +-----------------+
5 | Wireless AP |
6 +-----------------+
7 100Mbps link | | 1Gbps link
8 +----------+ +-----------+
9 | |
10+-------------------+ +-------------------+
11| Raspberry Pi 3 | | Host Machine |
12| (192.168.219.103) | | (192.168.219.108) |
13+-------------------+ +-------------------+
Also using hping3 program for Stress-testing that is going to be just flooding ICMP packets to Rpi3
.
1$ hping3 --icmp --faster 192.168.219.103 -d 20
Tuning Raspberry-Pi 3 for the testing
- Ubuntu 22.10 Kinetic Release - Kernel 5.19.0-1007 (Arm64)
- Enable
CONFIG_HOTPLUG_CPU
to on/off CPU cores - esBPF-based customized eth driver, smsc95xx-esbpf
- Off
wlan0
interface not to mess up routing
It's set up using 2 cores instead of entire CPUs to load up full traffic on a specific number of cores by maxcpus=2
at boot command-line. Hence we have 2 online and offline cores respectively,
1ubuntu@ubuntu:~$ lscpu
2Architecture: aarch64
3 CPU op-mode(s): 32-bit, 64-bit
4 Byte Order: Little Endian
5CPU(s): 4
6 On-line CPU(s) list: 0,1
7 Off-line CPU(s) list: 2,3
8Vendor ID: ARM
9 Model name: Cortex-A53
Briefing about smsc95xx-esbpf
Two significant files exist under a directory /proc/smsc95xx/esbpf
once the driver has been loaded on Kernel and each other is responsible for ...
- rx_enable : turns on/off
esbpf
operations. - rx_hooks : is supposed to be written by a program of cBPF instructions.
Stress-testing
We are going to look at mpstat values and compare NET_RX in /proc/softirqs
before and after executing hping3. Please suppose the program would be running for 60 seconds on Host
in each case.
Here is the idle usage of the CPUs of Rpi3
. The idle columns are almost the same in both testing cases, iptables and Software-Offload before generating massive traffic on the LAN.
1$ mpstat -P ALL 3
2CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
3all 0.00 0.00 0.17 0.00 0.00 0.17 0.00 0.00 0.00 99.66
4 0 0.00 0.00 0.34 0.00 0.00 0.00 0.00 0.00 0.00 99.66
5 1 0.00 0.00 0.00 0.00 0.00 0.34 0.00 0.00 0.00 99.66
1. iptables
In the first test, the following rule is supposed to be appended in INPUT part on Rpi3
and as the result, one of the CPUs is being performed by softirq which means so busy to work.
1$ iptables -A INPUT -p icmp -j DROP
2$ iptables -nvL
3Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
4 pkts bytes target prot opt in out source destination
5 0 0 DROP icmp -- * * 0.0.0.0/0 0.0.0.0/0
6
7# NET_RX softirq count before massive traffic
8 CPU0 CPU1 CPU2 CPU3
9 NET_RX: 123 66 0 0
10
11# NET_RX softirq count after that
12 CPU0 CPU1 CPU2 CPU3
13 NET_RX: 15040 35021 0 0
14
15# mpstat
16CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
17all 0.00 0.00 0.18 0.00 0.00 52.89 0.00 0.00 0.00 46.94
18 0 0.00 0.00 0.37 0.00 0.00 0.74 0.00 0.00 0.00 98.89
19 1 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00
2. esBPF
In the second test, it's going to drop the same type of packets in Software-Offload, in other words, in-driver. Special tools are required for doing that, tcpdump and filter_icmp but the latter already has hard-coded cBPF instructions, so tcpdump ain't necessary at this point.
The hard-coded part is as follows
1struct sock_filter insns[] = {
2 /* tcpdump -dd -nn icmp */
3 { 0x28, 0, 0, 0x0000000c },
4 { 0x15, 0, 3, 0x00000800 },
5 { 0x30, 0, 0, 0x00000017 },
6 { 0x15, 0, 1, 0x00000001 },
7 { 0x6, 0, 0, 0x00040000 },
8 { 0x6, 0, 0, 0x00000000 },
9};
and the program is executed by the following command that actually tries writing the above instructions to esBPF module.
1$ sudo ./filter_icmp /proc/smsc95xx/esbpf/rx_hooks
2$ sudo echo 1 > /proc/smsc95xx/esbpf/rx_enable
Even though hping3 works in the same flow, NET_RX didn't rise as much as the first case.
1# NET_RX softirq count before massive traffic
2 CPU0 CPU1 CPU2 CPU3
3 NET_RX: 129 81 0 0
4
5# NET_RX softirq count after that
6 CPU0 CPU1 CPU2 CPU3
7 NET_RX: 141 94 0 0
Also, the average usage of CPUs by softirq is around 8% up to 30% by looking at the best and worst cases respectively.
1# mpstat in the best case
2CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
3all 0.00 0.00 0.64 0.00 0.00 7.99 0.00 0.00 0.00 91.37
4 0 0.00 0.00 0.65 0.00 0.00 6.54 0.00 0.00 0.00 92.81
5 1 0.00 0.00 0.62 0.00 0.00 9.38 0.00 0.00 0.00 90.00
6
7# mpstat in the worst case
8CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
9all 18.31 0.00 4.58 0.96 0.00 27.47 0.00 0.00 0.00 48.67
10 0 14.50 0.00 4.00 1.00 0.00 26.00 0.00 0.00 0.00 54.50
11 1 21.86 0.00 5.12 0.93 0.00 28.84 0.00 0.00 0.00 43.26
Notice that you may sometimes see a few ICMP packets come to Network Stack although esBPF is enabled. No worries they are just from lo interface.
Conclusion
esBPF works on Software-Offload, as known as device driver layer against Netfilter, a superset of iptables which works in Network Stack. Hence it drops all incoming packets matched to the filters in Tasklet level instead of NET_RX (part of Network Stack) and as we see the result of esBPF, Kernel doesn't need any extra tasks.
The project could be better than packet filtering in Network Stack in some cases even though the worst case takes CPU resources about four times than the best case. Of course, It depends on how big/long cBPF instructions are in esBPF though.
The project is still in progress such as being more flexible, optimizing, and taking Cache mechanism.
I figured out through this Stress-testing that it will be worth putting more effort into the project and keep working, at least I don't waste my time. Also, it was a nice time to take the responsibility for the entire process from design to testing.
I hope everyone has enjoyed the article, cheers ;-)