The journey to achieve 0 allocations
                    
                    
                    
                        
                            benchmark
                        
                            profiling
                        
                            allocs
                        
                    
                    
                    
                Go is fast, but when doing things carefully, besides being faster is very light.
violetear’s history is not very different from many others. It is an HTTP router capable of handling any requests.
One of violetear’s main constraints is how to deal with static and dynamic URL’s. Let’s face it, if you want to replace an important component of your current architecture you should try to break as less as possible - highly trained monkeys hate changes. π
For example, the need to support this type of requests:
host.tld/static/8E605AD7-554A-450D-A72D-D0098D336E8E/127.0.0.1
        \_____/\_____________________________________________/
           |                         |
         static                   dynamic        Allow only method PUT
host.tld/static/127.0.0.1/8E605AD7-554A-450D-A72D-D0098D336E8E
        \_____/\_____________________________________________/
           |                         |
         static                   dynamic        Allow only methods GET and HEAD
Which could be represented in the router like:
uuid := `[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}`
router.AddRegex(":uuid", uuid)
router.AddRegex(":ipv4", `^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$`)
router.HandleFunc("/static/:uuid/:ipv4", handleUUID, "PUT")
router.HandleFunc("/static/:ipv4/:uuid", handleIPV$, "GET,HEAD")
As if that were not enough, also with versioning support
While implementing this, the use of regular expressions was inevitable, but by using them where they were not required, they came with a big penalty.
This is how the router was splitting the path for all requests:
var splitPathRx = regexp.MustCompile(`[^/ ]+`)
// splitPath returns an slice of the path
func (r *Router) splitPath(p string) []string {
	pathParts := splitPathRx.FindAllString(p, -1)
	// root (empty slice)
	if len(pathParts) == 0 {
		pathParts = append(pathParts, "/")
	}
	return pathParts
}pprof ΒΆ
Go comes with “batteries included” and is very easy to start profiling the code by the use of the package pprof.
The first goal was to find the bottlenecks. The following code was used for this:
package main
import (
	"net/http"
	"net/http/pprof"
	"github.com/nbari/violetear"
)
func hello(w http.ResponseWriter, r *http.Request) {
	w.Write([]byte("hello"))
}
func main() {
	r := violetear.New()
	r.AddRegex(":word", `^\w+$`)
	r.HandleFunc("/hello", hello, "GET,HEAD")
	r.HandleFunc("/hello/:word/", hello, "GET,HEAD")
	r.HandleFunc("/*", hello, "GET,HEAD")
	// Register pprof handlers
	r.HandleFunc("/debug/pprof/*", pprof.Index)
	r.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
	r.HandleFunc("/debug/pprof/profile", pprof.Profile)
	r.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
	r.HandleFunc("/debug/pprof/trace", pprof.Trace)
	http.ListenAndServe(":8080", r)
}
Notice the pprof handlers:
// Register pprof handlers
r.HandleFunc("/debug/pprof/*", pprof.Index)
r.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
r.HandleFunc("/debug/pprof/profile", pprof.Profile)
r.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
r.HandleFunc("/debug/pprof/trace", pprof.Trace)
wrk ΒΆ
For making some random requests, wrk + lua do a nice combo, this is the
command I used for testing:
$ wrk -c100 -d10 -t2 -s counter.lua http://0:8080/
And the contents of the counter.lua script:
counter = 0
request = function()
    if counter % 100 == 0 then
	    path = "/"
    else
	    path = path .. counter .. "/"
    end
    wrk.headers["X-Counter"] = counter
    counter = counter + 1
    return wrk.format(nil, path)
end
This basically will make requests in the following way:
/
/.
/./.
/././.
/./././.
/././././.
/./././././.
/././././././.
/./././././././.
/././././././././.
...
It is very basic. This script will test the router with both static and dynamic handlers - source code for this can be found here.
Once everything is setup, the router should be up, running and listening for
requests, wrk is called like mentioned before and in another terminal this
command is used:
go tool pprof http://0:8080/debug/pprof/heap
After running the command you enter into an interactive mode. There I typed
top and got this output:
 1(pprof) top
 2Showing nodes accounting for 4694.50kB, 100% of 4694.50kB total
 3Showing top 10 nodes out of 25
 4      flat  flat%   sum%        cum   cum%
 5 3155.84kB 67.22% 67.22%  3155.84kB 67.22%  regexp.(*bitState).reset /usr/local/Cellar/go/1.9/libexec/src/regexp/backtrack.go
 6     514kB 10.95% 78.17%      514kB 10.95%  bufio.NewWriterSize /usr/local/Cellar/go/1.9/libexec/src/bufio/bufio.go
 7  512.62kB 10.92% 89.09%   512.62kB 10.92%  regexp.(*Regexp).FindAllString.func1 /usr/local/Cellar/go/1.9/libexec/src/regexp/regexp.go
 8  512.03kB 10.91%   100%   512.03kB 10.91%  syscall.anyToSockaddr /usr/local/Cellar/go/1.9/libexec/src/syscall/syscall_bsd.go
 9         0     0%   100%  3668.46kB 78.14%  github.com/nbari/violetear.(*Router).ServeHTTP /Users/nbari/projects/go/src/github.com/nbari/violetear/violetear.go
10         0     0%   100%  3668.46kB 78.14%  github.com/nbari/violetear.(*Router).splitPath /Users/nbari/projects/go/src/github.com/nbari/violetear/violetear.go
11         0     0%   100%   512.03kB 10.91%  internal/poll.(*FD).Accept /usr/local/Cellar/go/1.9/libexec/src/internal/poll/fd_unix.go
12         0     0%   100%   512.03kB 10.91%  internal/poll.accept /usr/local/Cellar/go/1.9/libexec/src/internal/poll/sys_cloexec.go
13         0     0%   100%   512.03kB 10.91%  main.main /Users/nbari/projects/go/src/github.com/nbari/go-sandbox/profile-violetear/profile/main.go
14         0     0%   100%   512.03kB 10.91%  net.(*TCPListener).AcceptTCP /usr/local/Cellar/go/1.9/libexec/src/net/tcpsock.goNotice lines 7 and 10
Still in the interactive mode, I typed: list splitPath which helps to show the
lines of code. This was the output:
 1(pprof) list splitPath
 2Total: 2.02MB
 3ROUTINE ======================== github.com/nbari/violetear.(*Router).splitPath in /Users/nbari/projects/go/src/github.com/nbari/violetear/violetear.go
 4         0     1.02MB (flat, cum) 50.37% of Total
 5         .          .    261:   return
 6         .          .    262:}
 7         .          .    263:
 8         .          .    264:// splitPath returns an slice of the path
 9         .          .    265:func (r *Router) splitPath(p string) []string {
10         .     1.02MB    266:   pathParts := splitPathRx.FindAllString(p, -1)
11         .          .    267:
12         .          .    268:   // root (empty slice)
13         .          .    269:   if len(pathParts) == 0 {
14         .          .    270:           pathParts = append(pathParts, "/")
15         .          .    271:   }
16(pprof)This showed that splitting the path using a regular expression was inefficient. I changed the function to something like this:
func (r *Router) splitPath(p string) []string {
        pathParts := strings.FieldsFunc(p, func(c rune) bool {
                return c == '/'
        })
        // root (empty slice)
        if len(pathParts) == 0 {
                return []string{"/"}
        }
        return pathParts
}
I created a small bench test:
package violetear
import (
	"net/http"
	"testing"
)
func benchRequest(b *testing.B, router http.Handler, r *http.Request) {
	w := &ResponseWriter{}
	u := r.URL
	rq := u.RawQuery
	r.RequestURI = u.RequestURI()
	b.ReportAllocs()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		u.RawQuery = rq
		router.ServeHTTP(w, r)
	}
}
func BenchmarkRouterStatic(b *testing.B) {
	router := New()
	router.HandleFunc("/hello", func(w http.ResponseWriter, r *http.Request) {}, "GET,HEAD")
	r, _ := http.NewRequest("GET", "/hello", nil)
	benchRequest(b, router, r)
}
func BenchmarkRouterDynamic(b *testing.B) {
	router := New()
	router.AddRegex(":word", `^\w+$`)
	router.HandleFunc("/test/:word", func(w http.ResponseWriter, r *http.Request) {}, "GET,HEAD")
	r, _ := http.NewRequest("GET", "/test/foo", nil)
	benchRequest(b, router, r)
}
And run it using:
go test -run=^$ -bench=.
The output was a small win of 3 allocation less π
 1$ go test -run=^$ -bench=.
 22017/10/02 18:47:23 Adding path: /hello [GET,HEAD]
 3goos: darwin
 4goarch: amd64
 5pkg: github.com/nbari/violetear
 6BenchmarkRouterStatic-4         2017/10/02 18:47:23 Adding path: /hello [GET,HEAD]
 72017/10/02 18:47:23 Adding path: /hello [GET,HEAD]
 82017/10/02 18:47:23 Adding path: /hello [GET,HEAD]
 9 2000000               692 ns/op             616 B/op         10 allocs/op
102017/10/02 18:47:25 Adding path: /test/:word [GET,HEAD]
11BenchmarkRouterDynamic-4        2017/10/02 18:47:25 Adding path: /test/:word [GET,HEAD]
122017/10/02 18:47:25 Adding path: /test/:word [GET,HEAD]
132017/10/02 18:47:25 Adding path: /test/:word [GET,HEAD]
14 1000000              1256 ns/op             936 B/op         12 allocs/op
15PASS
16ok      github.com/nbari/violetear      3.380sBefore doing the changes this was the output: 11 allocations for static routes and 14 for dynamic π’
 1$ go test -run=^$ -bench=.
 22017/10/02 09:56:34 Adding path: /hello [GET,HEAD]
 3goos: darwin
 4goarch: amd64
 5pkg: github.com/nbari/violetear
 6BenchmarkRouterStatic-4         2017/10/02 09:56:34 Adding path: /hello [GET,HEAD]
 72017/10/02 09:56:34 Adding path: /hello [GET,HEAD]
 82017/10/02 09:56:34 Adding path: /hello [GET,HEAD]
 91000000              1145 ns/op             776 B/op         11 allocs/op
102017/10/02 09:56:35 Adding path: /test/:word [GET,HEAD]
11BenchmarkRouterDynamic-4        2017/10/02 09:56:35 Adding path: /test/:word [GET,HEAD]
122017/10/02 09:56:35 Adding path: /test/:word [GET,HEAD]
132017/10/02 09:56:35 Adding path: /test/:word [GET,HEAD]
141000000              1835 ns/op            1096 B/op         14 allocs/op
15PASS
16ok      github.com/nbari/violetear      3.029sTo get more info about the allocations, I mainly used
pprof with options like alloc_space:
go tool pprof --alloc_space  http://0:8080/debug/pprof/heap
When setting allocfreetrace=1 to environment variable GODEBUG, you can see
stacktrace where allocation occurs:
allocfreetrace: setting allocfreetrace=1 causes every allocation to be
profiled and a stack trace printed on each object's allocation and free.
To build the tests:
go test -c
And to run the tests writing to a file all the output:
GODEBUG=allocfreetrace=1 ./test.test -test.run=none -test.bench=BenchmarkRouter -test.benchtime=10ms 2>trace.log
After doing all this I was available to save up to 90% of the allocations for static routes, from 11 to 1:
 1$ go test -run=^$ -bench=.
 22017/10/02 09:51:47 Adding path: /hello [GET,HEAD]
 3goos: darwin
 4goarch: amd64
 5pkg: github.com/nbari/violetear
 6BenchmarkRouterStatic-4         2017/10/02 09:51:47 Adding path: /hello [GET,HEAD]
 72017/10/02 09:51:47 Adding path: /hello [GET,HEAD]
 82017/10/02 09:51:47 Adding path: /hello [GET,HEAD]
 92017/10/02 09:51:47 Adding path: /hello [GET,HEAD]
1010000000               213 ns/op              16 B/op          1 allocs/op
112017/10/02 09:51:50 Adding path: /test/:word [GET,HEAD]
12BenchmarkRouterDynamic-4        2017/10/02 09:51:50 Adding path: /test/:word [GET,HEAD]
132017/10/02 09:51:50 Adding path: /test/:word [GET,HEAD]
142017/10/02 09:51:50 Adding path: /test/:word [GET,HEAD]
151000000              1115 ns/op             816 B/op          7 allocs/op
16PASS
17ok      github.com/nbari/violetear      3.507sgo-torch π₯ ΒΆ
To have a better graphical representation about the profiling output go-torch is an excellent tool for creating flame graphs.
The easiest way to run it is like this:
$ docker run uber/go-torch -u http://[address-of-host] -p > torch.svg
user the IP/address of your host, not localhost or 127.0.0.1
This is an example of how it was behaving the router before:
After:
0 allocations
Make it happen π
 1$ go test -run=^$ -bench=.
 22017/10/05 21:57:31 Adding path: /hello [GET,HEAD]
 3goos: darwin
 4goarch: amd64
 5pkg: github.com/nbari/violetear
 6BenchmarkRouterStatic-4         2017/10/05 21:57:31 Adding path: /hello [GET,HEAD]
 72017/10/05 21:57:31 Adding path: /hello [GET,HEAD]
 82017/10/05 21:57:31 Adding path: /hello [GET,HEAD]
 92017/10/05 21:57:32 Adding path: /hello [GET,HEAD]
1010000000               122 ns/op               0 B/op          0 allocs/op
112017/10/05 21:57:33 Adding path: /test/:word [GET,HEAD]
12BenchmarkRouterDynamic-4        2017/10/05 21:57:33 Adding path: /test/:word [GET,HEAD]
132017/10/05 21:57:33 Adding path: /test/:word [GET,HEAD]
142017/10/05 21:57:33 Adding path: /test/:word [GET,HEAD]
152017/10/05 21:57:34 Adding path: /test/:word [GET,HEAD]
16 2000000               913 ns/op             784 B/op          6 allocs/op
17PASS
18ok      github.com/nbari/violetear      4.144sThe router since always has been behaving quite well, this mainly thanks to Go itself and well, to the modest traffic it has been handling.
While doing some benchmarks and simulating some DDOS I found some patterns. All
were related to the way the request.URL.Path was handled. The thing I noticed
was that the bigger the request path was the more memory allocations were used.
At the end was pretty much making sense since by splitting the path by / a
slice was returned with all the possible parts of the path.
For example, for a request like:
/the/quick/brown/fox/jumps/over/the/lazy/dog
The router was creating a slice of 9 items:
[ the quick brown fox jumps over the lazy dog]
In many cases, this would never be defined as routes and could probably just be served by a catch-all(*) handler.
To fix this, instead of splitting and returning a slice, the path is now parsed. Each component of the path from left to right is used to find within the set of defined handlers; so for the same request shown before, the router converts it to:
key := "the"
path := "/quick/brown/fox/jumps/over/the/lazy/dog"
It searches for a possible handler named the and if found then next will try again using
quick and shorting the path to /brown/fox/jumps/over/the/lazy/dog:
key := "quick"
path := "/brown/fox/jumps/over/the/lazy/dog"
key := "brown"
path := "/fox/jumps/over/the/lazy/dog"
key := "fox"
path := "/jumps/over/the/lazy/dog"
And so on until a handler matches the key or a 404 is returned.
This by principle made the router more efficient but now the challenge was how to parse the path without creating allocations.
The tricky part was to avoid concatenating strings runes or bytes, by just using the existing path and return chunks of the slice did the trick, the current code look like is this:
 1func (t *Trie) SplitPath(path string) (string, string) {
 2	var key string
 3	if path == "" {
 4		return key, path
 5	} else if path == "/" {
 6		return path, ""
 7	}
 8	for i := 0; i < len(path); i++ {
 9		if path[i] == '/' {
10			if i == 0 {
11				return t.SplitPath(path[1:])
12			}
13			if i > 0 {
14				key = path[:i]
15				path = path[i:]
16				if path == "/" {
17					return key, ""
18				}
19				return key, path
20			}
21		}
22	}
23	return path, ""
24}The router now is faster and static routes are 0% fat π.
There is still work in progress - who would think that behind a simple router there is too much logic involved. The experience so far has been very pleasant, learning and improving every day more about Go and its niceties.
To make an even better improvement, any tips, comments or a review will be more than welcome π