The other day I was running some validation and performance tests for a new web service. To do this, I was effectively making a bunch of queries against a local HTTP endpoint. Both the server and client are written in Go, and the code querying the endpoint looks somewhat like this:
This simplified setup has two functions.
main function iterates over all records (what exactly these records are does not matter in this context).
Crucially, it also sets up a shared HTTP transport and client, since this is the standard way of doing HTTP connection reuse in Golang.
runRequest function takes this shared HTTP client as an argument and uses it to send the request to the server.
However, after a short while, my server (running with
go run -race cmd/server) crashed with the following error message:
race: limit on 8128 simultaneously alive goroutines is exceeded, dying.
Woah, that’s a pretty serious error message!
After a quick online search, I found out that when running a Go program with the race detector enabled, it will automatically limit the number of active goroutines to 8128. This indicates that the application is leaking goroutines somewhere, so I connected to the pprof endpoint and looked at the active goroutines. And indeed, there are thousands of active routines “stuck”:
5342 @ 0x470436 0x469097 0x49c6c9 0x53253d 0x5350e5 0x5350b5 0x652b4a 0x66fcc5 0x66e4c8 0x7d55a7 0x7d4f7d 0xbd2569 0xbd20f5 0x470072 0x4a1f41 # 0x49c6c8 internal/poll.runtime_pollWait+0x88 /usr/lib/golang/src/runtime/netpoll.go:302 # 0x53253c internal/poll.(*pollDesc).wait+0xbc /usr/lib/golang/src/internal/poll/fd_poll_runtime.go:83 # 0x5350e4 internal/poll.(*pollDesc).waitRead+0x424 /usr/lib/golang/src/internal/poll/fd_poll_runtime.go:88 # 0x5350b4 internal/poll.(*FD).Accept+0x3f4 /usr/lib/golang/src/internal/poll/fd_unix.go:614 # 0x652b49 net.(*netFD).accept+0x49 /usr/lib/golang/src/net/fd_unix.go:172 # 0x66fcc4 net.(*TCPListener).accept+0x44 /usr/lib/golang/src/net/tcpsock_posix.go:139 # 0x66e4c7 net.(*TCPListener).Accept+0x67 /usr/lib/golang/src/net/tcpsock.go:288 # 0x7d55a6 net/http.(*Server).Serve+0x5a6 /usr/lib/golang/src/net/http/server.go:3039 # 0x7d4f7c net/http.(*Server).ListenAndServe+0xbc /usr/lib/golang/src/net/http/server.go:2968 # 0xbd2568 main.serve+0x448 /app/cmd/server/main.go:98 # 0xbd20f4 main.main+0x7d4 /app/cmd/server/main.go:61 # 0x470071 runtime.main+0x211 /usr/lib/golang/src/runtime/proc.go:250
This is strange, because I thought I had connection reuse enabled… and furthermore, the client is performing all requests sequentially.
It turns out that the
http.Client will only reuse connections when the response body has been fully read and closed!
Otherwise, the application keeps accumulating open connections, especially because the number of connections per host is unlimited by default (
Upon reading the Go documentation about HTTP responses very carefully, I found this:
The default HTTP client’s Transport may not reuse HTTP/1.x “keep-alive” TCP connections if the Body is not read to completion and closed.
Note that just closing the body is not enough, the body also has to be read fully before closing it. This slightly unintuitive behavior was also discussed in https://stackoverflow.com/a/17953506.
To get connection reuse working in this scenario, I had to modify the code as follows:
Et voilà, no more thousands of active goroutines (and associated TCP connections)! This was another good example of why it’s useful to always run Go programs with the race detector during development.