Sometimes it can be useful to inspect the state of a TCP endpoint. Things such as the current congestion window, the retransmission timeout (RTO), duplicate ack threshold, etc. are not reflected in the segments that flow over the wire. Therefore, just looking at packet captures can leave you scratching your head as to why a TCP connection is behaving a certain way.
Using the Linux ss
utility coupled with crash
, its not too difficult to
inspect some of the internal TCP state for a socket on Linux. Figuring out the
meaning of all variables and how they relate to the variables referenced in the
many TCP RFCs and papers is another matter, but at least we can get some idea of
what is going on.
First, you can ask ss
to give you information about, say, NFS sockets in use
on a given client system:
[cperl@localhost ~]$ ss -eipn '( dport = :nfs )'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.1.10:975 192.168.1.200:2049 ino:12453 sk:ffff8802305a0800
ts sack cubic wscale:6,7 rto:201 rtt:1.875/0.75 ato:40 cwnd:10 ssthresh:40 send 61.8Mbps rcv_rtt:1.875 rcv_space:1814280
ESTAB 0 0 192.168.1.10:971 192.168.1.201:2049 ino:16576 sk:ffff88022f14d6c0
ts sack cubic wscale:6,7 rto:202 rtt:2.125/1.75 ato:40 cwnd:10 ssthresh:405 send 54.5Mbps rcv_rtt:5 rcv_space:3011258
Internally, ss
uses the tcp_diag
kernel module to extract extra information
(this is done via an AF_NETLINK
socket).
A lot of interesting TCP state is provided in this output. For example, you can see the current retransmission timeout (“rto”), the current buffer space available for receiving data (“rcv_space”), the congestion control algorithm (“cubic”) and you can see what the window scale option for the connection is (the number before the comma is the scaling applied to the window offered by the remote endpoint and the number after the comma is the scaling the remote endpoint will be applying to the window offered by us (i.e. its the Window Scale option we sent in our initial SYN). Some of the other variables are interesting too, but going into details on all of them is beyond the scope of this blog post.
If you’re really, really interested in the kernel’s internal state, you can also
take the address of the struct sock
that ss
gave you (e.g.
sk:ffff8802305a0800) and inspect it with crash:
[cperl@localhost ~]$ sudo crash -e emacs
...
KERNEL: /usr/lib/debug/lib/modules/2.6.32-431.1.2.0.1.el6.x86_64/vmlinux
DUMPFILE: /dev/crash
CPUS: 4
DATE: Tue Jul 1 15:26:19 2014
UPTIME: 1 days, 07:32:48
LOAD AVERAGE: 0.08, 0.05, 0.01
TASKS: 871
NODENAME: localhost
RELEASE: 2.6.32-431.1.2.0.1.el6.x86_64
VERSION: #1 SMP Fri Dec 13 13:06:13 UTC 2013
MACHINE: x86_64 (2992 Mhz)
MEMORY: 7.9 GB
PID: 29732
COMMAND: "crash"
TASK: ffff88013a928080 [THREAD_INFO: ffff88011b548000]
CPU: 1
STATE: TASK_RUNNING (ACTIVE)
crash> struct tcp_sock.rcv_nxt,snd_una,reordering ffff8802305a0800
rcv_nxt = 3794247234
snd_una = 2557966926
reordering = 3
Because of the way Linux stores the structures in memory, you can just cast the
struct sock
to a struct tcp_sock
. If you leave off the specific members in
the “struct” invocation above you can get a recursive dump of all the fields and
the structures embedded within (its just too large to be useful in this blog
post).
It’s possible you might not be able to get what you want just using crash
and
may want to turn to a tool like SystemTap to further figure out what is going
on, but this is a decent place to start.