The journey to achieve 0 allocations
benchmark
profiling
allocs
Go is fast, but when doing things carefully, besides being faster is very light.
violetear’s history is not very different from many others. It is an HTTP router capable of handling any requests.
One of violetear’s main constraints is how to deal with static and dynamic URL’s. Let’s face it, if you want to replace an important component of your current architecture you should try to break as less as possible - highly trained monkeys hate changes. π
For example, the need to support this type of requests:
host.tld/static/8E605AD7-554A-450D-A72D-D0098D336E8E/127.0.0.1
\_____/\_____________________________________________/
| |
static dynamic Allow only method PUT
host.tld/static/127.0.0.1/8E605AD7-554A-450D-A72D-D0098D336E8E
\_____/\_____________________________________________/
| |
static dynamic Allow only methods GET and HEAD
Which could be represented in the router like:
uuid := `[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}`
router.AddRegex(":uuid", uuid)
router.AddRegex(":ipv4", `^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$`)
router.HandleFunc("/static/:uuid/:ipv4", handleUUID, "PUT")
router.HandleFunc("/static/:ipv4/:uuid", handleIPV$, "GET,HEAD")
As if that were not enough, also with versioning support
While implementing this, the use of regular expressions was inevitable, but by using them where they were not required, they came with a big penalty.
This is how the router was splitting the path for all requests:
var splitPathRx = regexp.MustCompile(`[^/ ]+`)
// splitPath returns an slice of the path
func (r *Router) splitPath(p string) []string {
pathParts := splitPathRx.FindAllString(p, -1)
// root (empty slice)
if len(pathParts) == 0 {
pathParts = append(pathParts, "/")
}
return pathParts
}
pprof ΒΆ
Go comes with “batteries included” and is very easy to start profiling the code by the use of the package pprof.
The first goal was to find the bottlenecks. The following code was used for this:
package main
import (
"net/http"
"net/http/pprof"
"github.com/nbari/violetear"
)
func hello(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("hello"))
}
func main() {
r := violetear.New()
r.AddRegex(":word", `^\w+$`)
r.HandleFunc("/hello", hello, "GET,HEAD")
r.HandleFunc("/hello/:word/", hello, "GET,HEAD")
r.HandleFunc("/*", hello, "GET,HEAD")
// Register pprof handlers
r.HandleFunc("/debug/pprof/*", pprof.Index)
r.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
r.HandleFunc("/debug/pprof/profile", pprof.Profile)
r.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
r.HandleFunc("/debug/pprof/trace", pprof.Trace)
http.ListenAndServe(":8080", r)
}
Notice the pprof
handlers:
// Register pprof handlers
r.HandleFunc("/debug/pprof/*", pprof.Index)
r.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
r.HandleFunc("/debug/pprof/profile", pprof.Profile)
r.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
r.HandleFunc("/debug/pprof/trace", pprof.Trace)
wrk ΒΆ
For making some random requests, wrk
+ lua
do a nice combo, this is the
command I used for testing:
$ wrk -c100 -d10 -t2 -s counter.lua http://0:8080/
And the contents of the counter.lua
script:
counter = 0
request = function()
if counter % 100 == 0 then
path = "/"
else
path = path .. counter .. "/"
end
wrk.headers["X-Counter"] = counter
counter = counter + 1
return wrk.format(nil, path)
end
This basically will make requests in the following way:
/
/.
/./.
/././.
/./././.
/././././.
/./././././.
/././././././.
/./././././././.
/././././././././.
...
It is very basic. This script will test the router with both static and dynamic handlers - source code for this can be found here.
Once everything is setup, the router should be up, running and listening for
requests, wrk
is called like mentioned before and in another terminal this
command is used:
go tool pprof http://0:8080/debug/pprof/heap
After running the command you enter into an interactive mode. There I typed
top
and got this output:
1(pprof) top
2Showing nodes accounting for 4694.50kB, 100% of 4694.50kB total
3Showing top 10 nodes out of 25
4 flat flat% sum% cum cum%
5 3155.84kB 67.22% 67.22% 3155.84kB 67.22% regexp.(*bitState).reset /usr/local/Cellar/go/1.9/libexec/src/regexp/backtrack.go
6 514kB 10.95% 78.17% 514kB 10.95% bufio.NewWriterSize /usr/local/Cellar/go/1.9/libexec/src/bufio/bufio.go
7 512.62kB 10.92% 89.09% 512.62kB 10.92% regexp.(*Regexp).FindAllString.func1 /usr/local/Cellar/go/1.9/libexec/src/regexp/regexp.go
8 512.03kB 10.91% 100% 512.03kB 10.91% syscall.anyToSockaddr /usr/local/Cellar/go/1.9/libexec/src/syscall/syscall_bsd.go
9 0 0% 100% 3668.46kB 78.14% github.com/nbari/violetear.(*Router).ServeHTTP /Users/nbari/projects/go/src/github.com/nbari/violetear/violetear.go
10 0 0% 100% 3668.46kB 78.14% github.com/nbari/violetear.(*Router).splitPath /Users/nbari/projects/go/src/github.com/nbari/violetear/violetear.go
11 0 0% 100% 512.03kB 10.91% internal/poll.(*FD).Accept /usr/local/Cellar/go/1.9/libexec/src/internal/poll/fd_unix.go
12 0 0% 100% 512.03kB 10.91% internal/poll.accept /usr/local/Cellar/go/1.9/libexec/src/internal/poll/sys_cloexec.go
13 0 0% 100% 512.03kB 10.91% main.main /Users/nbari/projects/go/src/github.com/nbari/go-sandbox/profile-violetear/profile/main.go
14 0 0% 100% 512.03kB 10.91% net.(*TCPListener).AcceptTCP /usr/local/Cellar/go/1.9/libexec/src/net/tcpsock.go
Notice lines 7 and 10
Still in the interactive mode, I typed: list splitPath
which helps to show the
lines of code. This was the output:
1(pprof) list splitPath
2Total: 2.02MB
3ROUTINE ======================== github.com/nbari/violetear.(*Router).splitPath in /Users/nbari/projects/go/src/github.com/nbari/violetear/violetear.go
4 0 1.02MB (flat, cum) 50.37% of Total
5 . . 261: return
6 . . 262:}
7 . . 263:
8 . . 264:// splitPath returns an slice of the path
9 . . 265:func (r *Router) splitPath(p string) []string {
10 . 1.02MB 266: pathParts := splitPathRx.FindAllString(p, -1)
11 . . 267:
12 . . 268: // root (empty slice)
13 . . 269: if len(pathParts) == 0 {
14 . . 270: pathParts = append(pathParts, "/")
15 . . 271: }
16(pprof)
This showed that splitting the path using a regular expression was inefficient. I changed the function to something like this:
func (r *Router) splitPath(p string) []string {
pathParts := strings.FieldsFunc(p, func(c rune) bool {
return c == '/'
})
// root (empty slice)
if len(pathParts) == 0 {
return []string{"/"}
}
return pathParts
}
I created a small bench test:
package violetear
import (
"net/http"
"testing"
)
func benchRequest(b *testing.B, router http.Handler, r *http.Request) {
w := &ResponseWriter{}
u := r.URL
rq := u.RawQuery
r.RequestURI = u.RequestURI()
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
u.RawQuery = rq
router.ServeHTTP(w, r)
}
}
func BenchmarkRouterStatic(b *testing.B) {
router := New()
router.HandleFunc("/hello", func(w http.ResponseWriter, r *http.Request) {}, "GET,HEAD")
r, _ := http.NewRequest("GET", "/hello", nil)
benchRequest(b, router, r)
}
func BenchmarkRouterDynamic(b *testing.B) {
router := New()
router.AddRegex(":word", `^\w+$`)
router.HandleFunc("/test/:word", func(w http.ResponseWriter, r *http.Request) {}, "GET,HEAD")
r, _ := http.NewRequest("GET", "/test/foo", nil)
benchRequest(b, router, r)
}
And run it using:
go test -run=^$ -bench=.
The output was a small win of 3 allocation less π
1$ go test -run=^$ -bench=.
22017/10/02 18:47:23 Adding path: /hello [GET,HEAD]
3goos: darwin
4goarch: amd64
5pkg: github.com/nbari/violetear
6BenchmarkRouterStatic-4 2017/10/02 18:47:23 Adding path: /hello [GET,HEAD]
72017/10/02 18:47:23 Adding path: /hello [GET,HEAD]
82017/10/02 18:47:23 Adding path: /hello [GET,HEAD]
9 2000000 692 ns/op 616 B/op 10 allocs/op
102017/10/02 18:47:25 Adding path: /test/:word [GET,HEAD]
11BenchmarkRouterDynamic-4 2017/10/02 18:47:25 Adding path: /test/:word [GET,HEAD]
122017/10/02 18:47:25 Adding path: /test/:word [GET,HEAD]
132017/10/02 18:47:25 Adding path: /test/:word [GET,HEAD]
14 1000000 1256 ns/op 936 B/op 12 allocs/op
15PASS
16ok github.com/nbari/violetear 3.380s
Before doing the changes this was the output: 11 allocations for static routes and 14 for dynamic π’
1$ go test -run=^$ -bench=.
22017/10/02 09:56:34 Adding path: /hello [GET,HEAD]
3goos: darwin
4goarch: amd64
5pkg: github.com/nbari/violetear
6BenchmarkRouterStatic-4 2017/10/02 09:56:34 Adding path: /hello [GET,HEAD]
72017/10/02 09:56:34 Adding path: /hello [GET,HEAD]
82017/10/02 09:56:34 Adding path: /hello [GET,HEAD]
91000000 1145 ns/op 776 B/op 11 allocs/op
102017/10/02 09:56:35 Adding path: /test/:word [GET,HEAD]
11BenchmarkRouterDynamic-4 2017/10/02 09:56:35 Adding path: /test/:word [GET,HEAD]
122017/10/02 09:56:35 Adding path: /test/:word [GET,HEAD]
132017/10/02 09:56:35 Adding path: /test/:word [GET,HEAD]
141000000 1835 ns/op 1096 B/op 14 allocs/op
15PASS
16ok github.com/nbari/violetear 3.029s
To get more info about the allocations, I mainly used
pprof
with options like alloc_space
:
go tool pprof --alloc_space http://0:8080/debug/pprof/heap
When setting allocfreetrace=1
to environment variable GODEBUG, you can see
stacktrace where allocation occurs:
allocfreetrace: setting allocfreetrace=1 causes every allocation to be
profiled and a stack trace printed on each object's allocation and free.
To build the tests:
go test -c
And to run the tests writing to a file all the output:
GODEBUG=allocfreetrace=1 ./test.test -test.run=none -test.bench=BenchmarkRouter -test.benchtime=10ms 2>trace.log
After doing all this I was available to save up to 90% of the allocations for static routes, from 11 to 1:
1$ go test -run=^$ -bench=.
22017/10/02 09:51:47 Adding path: /hello [GET,HEAD]
3goos: darwin
4goarch: amd64
5pkg: github.com/nbari/violetear
6BenchmarkRouterStatic-4 2017/10/02 09:51:47 Adding path: /hello [GET,HEAD]
72017/10/02 09:51:47 Adding path: /hello [GET,HEAD]
82017/10/02 09:51:47 Adding path: /hello [GET,HEAD]
92017/10/02 09:51:47 Adding path: /hello [GET,HEAD]
1010000000 213 ns/op 16 B/op 1 allocs/op
112017/10/02 09:51:50 Adding path: /test/:word [GET,HEAD]
12BenchmarkRouterDynamic-4 2017/10/02 09:51:50 Adding path: /test/:word [GET,HEAD]
132017/10/02 09:51:50 Adding path: /test/:word [GET,HEAD]
142017/10/02 09:51:50 Adding path: /test/:word [GET,HEAD]
151000000 1115 ns/op 816 B/op 7 allocs/op
16PASS
17ok github.com/nbari/violetear 3.507s
go-torch π₯ ΒΆ
To have a better graphical representation about the profiling output go-torch is an excellent tool for creating flame graphs.
The easiest way to run it is like this:
$ docker run uber/go-torch -u http://[address-of-host] -p > torch.svg
user the IP/address of your host, not localhost or 127.0.0.1
This is an example of how it was behaving the router before:
After:
0 allocations
Make it happen π
1$ go test -run=^$ -bench=.
22017/10/05 21:57:31 Adding path: /hello [GET,HEAD]
3goos: darwin
4goarch: amd64
5pkg: github.com/nbari/violetear
6BenchmarkRouterStatic-4 2017/10/05 21:57:31 Adding path: /hello [GET,HEAD]
72017/10/05 21:57:31 Adding path: /hello [GET,HEAD]
82017/10/05 21:57:31 Adding path: /hello [GET,HEAD]
92017/10/05 21:57:32 Adding path: /hello [GET,HEAD]
1010000000 122 ns/op 0 B/op 0 allocs/op
112017/10/05 21:57:33 Adding path: /test/:word [GET,HEAD]
12BenchmarkRouterDynamic-4 2017/10/05 21:57:33 Adding path: /test/:word [GET,HEAD]
132017/10/05 21:57:33 Adding path: /test/:word [GET,HEAD]
142017/10/05 21:57:33 Adding path: /test/:word [GET,HEAD]
152017/10/05 21:57:34 Adding path: /test/:word [GET,HEAD]
16 2000000 913 ns/op 784 B/op 6 allocs/op
17PASS
18ok github.com/nbari/violetear 4.144s
The router since always has been behaving quite well, this mainly thanks to Go itself and well, to the modest traffic it has been handling.
While doing some benchmarks and simulating some DDOS I found some patterns. All
were related to the way the request.URL.Path
was handled. The thing I noticed
was that the bigger the request path was the more memory allocations were used.
At the end was pretty much making sense since by splitting the path by /
a
slice was returned with all the possible parts of the path.
For example, for a request like:
/the/quick/brown/fox/jumps/over/the/lazy/dog
The router was creating a slice of 9 items:
[ the quick brown fox jumps over the lazy dog]
In many cases, this would never be defined as routes and could probably just be served by a catch-all(*) handler.
To fix this, instead of splitting and returning a slice, the path is now parsed. Each component of the path from left to right is used to find within the set of defined handlers; so for the same request shown before, the router converts it to:
key := "the"
path := "/quick/brown/fox/jumps/over/the/lazy/dog"
It searches for a possible handler named the
and if found then next will try again using
quick
and shorting the path to /brown/fox/jumps/over/the/lazy/dog
:
key := "quick"
path := "/brown/fox/jumps/over/the/lazy/dog"
key := "brown"
path := "/fox/jumps/over/the/lazy/dog"
key := "fox"
path := "/jumps/over/the/lazy/dog"
And so on until a handler matches the key
or a 404 is returned.
This by principle made the router more efficient but now the challenge was how to parse the path without creating allocations.
The tricky part was to avoid concatenating strings runes or bytes, by just using the existing path and return chunks of the slice did the trick, the current code look like is this:
1func (t *Trie) SplitPath(path string) (string, string) {
2 var key string
3 if path == "" {
4 return key, path
5 } else if path == "/" {
6 return path, ""
7 }
8 for i := 0; i < len(path); i++ {
9 if path[i] == '/' {
10 if i == 0 {
11 return t.SplitPath(path[1:])
12 }
13 if i > 0 {
14 key = path[:i]
15 path = path[i:]
16 if path == "/" {
17 return key, ""
18 }
19 return key, path
20 }
21 }
22 }
23 return path, ""
24}
The router now is faster and static routes are 0% fat π.
There is still work in progress - who would think that behind a simple router there is too much logic involved. The experience so far has been very pleasant, learning and improving every day more about Go and its niceties.
To make an even better improvement, any tips, comments or a review will be more than welcome π