Following my previous blogs, Detecting CPU-bound Applications in Server Systems and Detecting Disk I/O-bound Applications in Server Systems, I will continue the discussion in this blog on detecting a network-bound application.
When network I/O applications run, they can consume almost all available network bandwidth, which may cause resource contention and overall system performance issues, especially if multiple applications are trying to use those resources at the same time.
I will use this blog to illustrate how to detect a network I/O issue on a server (using a synthetic workload) and what tools are available to find the applications that are causing that contention. Additionally, I will show, using the same synthetic workload, the capacities of newer 10 GbE interfaces relative to older 1 GbE interfaces.
For illustration purposes, I downloaded a network benchmark tool called netperf (http://www.netperf.org) to generate heavy network traffic and installed it on two servers running the Linux* operating system. Both machines are equipped with built-in 1 GbE Intel® Ethernet Gigabit Adapters which connect to a 1 GbE Ethernet switch.
We can get information about the network interface by using the command ethtool. For example, to display the information about the first Ethernet card, we can see its speed is 1 Gb/s.
# ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000003 (3)
drv probe
Link detected: yes
To begin the netperf workload from the source machine (10.20.3.33) to the destination machine (10.23.3.34), I first initiated the network server on the destination machine that will handle traffic requests.
# netserver
On the source machine (10.20.3.33) I ran the benchmark tool netperf. The option –H specifies the host destination address; option –t TCP_STREAM specifies the type of test; option -D 1 to update the results every 1 second; option –l 20 to run the test in 20 seconds; and option -f g to display the unit in Gbits/s
# netperf –H 10.23.3.34 –t TCP_STREAM –D 1 –l 20 –f g
While the traffic is being generated, I used the common command top to look for any abnormal activity. Normally I would take this step if I experience performance issues, or if they are reported by consumers of the system. The top utility shows there is 0.0 % in I/O waiting time (%wa). This implies that there is no disk I/O activity detected. Among these applications, we see netperf is using 9% CPU.
# top top - 12:20:44 up 35 days, 20:31, 8 users, load average: 0.00, 0.01, 0.05 Tasks: 351 total, 1 running, 348 sleeping, 0 stopped, 2 zombie Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 48220M total, 21068M used, 27151M free, 744M buffers Swap: 2045M total, 0M used, 2045M free, 18886M cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 75043 root 20 0 10952 944 756 S 9 0.0 0:00.43 netperf 40180 snaik5 20 0 374m 26m 18m S 3 0.1 108:10.47 micsmc-gui 38473 snaik5 20 0 1182m 251m 67m S 1 0.5 49:10.48 amplxe-gui 5094 root 20 0 125m 17m 6116 S 0 0.0 254:44.35 X 5336 gdm 20 0 286m 61m 12m S 0 0.1 108:29.45 gdm-simple-gree 38385 snaik5 20 0 362m 19m 14m S 0 0.0 0:08.09 gnome-panel 75042 root 20 0 9048 1332 820 R 0 0.0 0:00.07 top 1 root 20 0 10528 808 676 S 0 0.0 0:24.12 init 2 root 20 0 0 0 0 S 0 0.0 0:00.68 kthreadd 3 root 20 0 0 0 0 S 0 0.0 0:11.41 ksoftirqd/0 4 root 20 0 0 0 0 S 0 0.0 0:00.07 kworker/0:0 6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0 7 root RT 0 0 0 0 S 0 0.0 0:08.97 watchdog/0 8 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1 10 root 20 0 0 0 0 S 0 0.0 0:14.15 ksoftirqd/1 12 root RT 0 0 0 0 S 0 0.0 0:08.41 watchdog/1 13 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/2 15 root 20 0 0 0 0 S 0 0.0 0:21.60 ksoftirqd/2 16 root RT 0 0 0 0 S 0 0.0 0:07.83 watchdog/2 17 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/3 18 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/3:0 19 root 20 0 0 0 0 S 0 0.0 0:10.72 ksoftirqd/3 20 root RT 0 0 0 0 S 0 0.0 0:08.04 watchdog/3 < truncate here> The command mpstat confirms that there is no disk I/O activity
# mpstat 1
Linux 3.0.13-0.27-default (knightscorner1) 01/30/14 _x86_64_
15:36:10 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
15:36:11 all 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.67
15:36:12 all 0.08 0.00 0.02 0.00 0.00 0.00 0.00 0.00 99.90
15:36:13 all 0.06 0.00 0.15 0.00 0.00 0.00 0.00 0.00 99.80
15:36:14 all 0.33 0.00 0.53 0.00 0.00 0.00 0.00 0.00 99.14
15:36:15 all 0.21 0.00 0.62 0.00 0.00 0.00 0.00 0.00 99.18
15:36:16 all 0.09 0.00 0.18 0.00 0.00 0.00 0.00 0.00 99.72
15:36:17 all 0.11 0.00 0.16 0.00 0.00 0.02 0.00 0.00 99.71
15:36:18 all 0.16 0.00 0.55 0.00 0.00 0.05 0.00 0.00 99.24
15:36:19 all 0.23 0.00 0.68 0.00 0.00 0.00 0.00 0.00 99.09
15:36:20 all 0.14 0.00 0.23 0.00 0.00 0.00 0.00 0.00 99.62
15:36:21 all 0.08 0.00 0.15 0.00 0.00 0.00 0.00 0.00 99.77
The two commands confirm that there is no CPU or disk I/O resource contention. As a next step, I tried to see if there is any network activity. I used “netstat –i" to show total packets received/sent at each interface.
# netstat -i
To show the packets sent/received at each interface, I used the command sar
# sar –n DEV 1 20
However, netstat and sar do not display the network bandwidth used. To figure out whether or not the network bandwidth is fully utilized, we can use the utility iftop. The utility iftop listens to network traffic and displays current bandwidth utilization by source/destination hosts over a network interface. In this example, when running iftop, I noticed that the network bandwidth between the two machines is almost saturated (0.99 Gb).
# iftop
The above figure shows that there is heavy traffic from knightscorner1 (10.23.3.33) to knightscorner2 (10.23.3.34) system and the usage is close to 1 Gb, which is the maximum bandwidth supported by the adapter.
To figure out which application is causing such heavy traffic, I installed the nethogs tool from http://nethogs.sourceforge.net . This is the top command equivalent for network bandwidth. nethogs monitors the network traffic bandwidth of each process on the system and helps identify the application that consumes the most network bandwidth.
The above screen shot shows that the application (in my experiment, ‘netperf’) consumes the majority of the available network bandwidth.
Finally, to compare the performance of a 1 GbE adaptor with a 10 GbE adapter, I ran the same tool on two new systems both having 10 GbE adapters. Both machines (in my example, 10.23.3.62 and 10.23.3.64) have built-in 10 GbE ports and they both connect to a 10 GbE switch. I ran the netperf utility with the following –t option:
TCP_STREAM to measure data transfer performance
TCP_RR (request/response test) to measure the transaction time.
TCP_RR calculates the transaction rate which can be used to deduce latency. Latency is defined as the time needed to send a packet from one host to the other, i.e., the round-trip time. RRT = 1/transaction rate.
In the first host 10.23.33.64, I ran these two tests:
#netperf –H 10.23.3.62 –t TCP_STREAM –D 1 –l 20 –f g
#netperf –H 10.23.3.62 –t TCP_RR
Similarly, in the second host 10.23.3.33, I ran the same tests:
#netperf –H 10.23.3.34 –t TCP_STREAM –D 1 –l 20 –f g
#netperf –H 10.23.3.34 –t TCP_RR
Each test was repeated three times. I calculated the average throughput (in Gb/s) and latency (in usec) for 1 GbE and 10 GbE adapters. The figures below summarize the results of my tests
Throughput measured for 1 GbE and 10 GbE adapters
Latency measured for 1 GbE and 10 GbE adapters
From what we can see here, the 10 GbE network interface is able to sustain far greater throughput, while minimizing latency. For these reasons, if you are experiencing system performance issues due to network I/O contention, it is worth considering upgrading to 10 GbE.
This blog shows a simple approach to analyze whether your system is suffering performance issues due to network I/O contention. If a network-bound problem is identified and there is no other way to improve the network availability (by better distributing network I/O workload across more systems, or reducing application I/O reliance through data compression), you may consider upgrading network interfaces with an Intel® Ethernet Gigabit Server Adapter to increase the available network bandwidth.
For more information on Intel® 10 GbE products, please refer to http://www.intel.com/content/www/us/en/network-adapters/gigabit-network-adapters/ethernet-server-adapters.html and http://www.intel.com/content/www/us/en/network-adapters/converged-network-adapters.html