Sunday, August 18, 2013

Rewriting a large production system in Go

My team at Google is wrapping up an effort to rewrite a large production system (almost) entirely in Go. I say "almost" because one component of the system -- a library for transcoding between image formats -- works perfectly well in C++, so we decided to leave it as-is. But the rest of the system is 100% Go, not just wrappers to existing modules in C++ or another language. It's been a fun experience and I thought I'd share some lessons learned.

Plus, the Go language has a cute mascot ... awwww!
Why rewrite?

The first question we must answer is why we considered a rewrite in the first place. When we started this project, we adopted an existing C++ based system, which had been developed over the course of a couple of years by two of our sister teams at Google. It's a good system and does its job remarkably well. However, it has been used in several different projects with vastly different goals, leading to a nontrivial accretion of cruft. Over time, it became apparent that for us to continue to innovate rapidly would be extremely challenging on this large, shared codebase. This is not a ding to the original developers -- it is just a fact that when certain design decisions become ossified, it becomes more difficult to rethink them, especially when multiple teams are sharing the code.

Before doing the rewrite, we realized we needed only a small subset of the functionality of the original system -- perhaps 20% (or less) of what the other projects were doing with it. We were also looking at making some radical changes to its core logic, and wanted to experiment with new features in a way that would not impact the velocity of our team or the others using the code. Finally, the cognitive burden associated with making changes to any large, shared codebase is unbearable -- almost any change required touching lots of code that the developer did not fully understand, and updating test cases with unclear consequences for the other users of the code.

So, we decided to fork off and do a from-scratch rewrite. The bet we made was that taking an initial productivity hit during the initial rewrite would pay off in droves when we were able to add more features over time. It has also given us an opportunity to rethink some of the core design decisions of our system, which has been extremely valuable for improving our own understanding of its workings.

Why Go?

I'll admit that at first I was highly skeptical of using Go. This production system sits directly on the serving path between users and their content, so it has to be fast. It also has to handle a large query volume, so CPU and memory efficiency are key. Go's reliance on garbage collection gave me pause (pun intended ... har har har), given how much pain Java developers go through to manage their memory footprint. Also, I was not sure how well Go would be supported for the kind of development we wanted to do inside of Google. Our system has lots of dependencies, and the last thing I wanted was to have to reinvent lots of libraries in Go that we already had in C++. Finally, there was also simply the fear of the unknown.

My whole attitude changed when Michael Piatek (one of the star engineers in the group) sent me an initial cut at the core system rewrite in Go, the result of less than a week's work. Unlike the original C++ based system, I could actually read the code, even though I didn't know Go (yet). The #1 benefit we get from Go is the lightweight concurrency provided by goroutines. Instead of a messy chain of dozens of asynchronous callbacks spread over tens of source files, the core logic of the system fits in a couple hundred lines of code, all in the same file. You just read it from top to bottom, and it makes sense.

Michael also made the observation that Go is a language designed for writing Web-based services. Its standard libraries provide all of the machinery you need for serving HTTP, processing URLs, dealing with sockets, doing crypto, processing dates and timestamps, doing compression. Unlike, say, Python, Go is a compiled language and therefore very fast. Go's modular design makes for beautiful decomposition of code across modules, with clear explicit dependencies between them. Its incremental compilation approach makes builds lightning fast. Automatic memory management means you never have to worry about freeing memory (although the usual caveats with a GC-based language apply).

Being terse

Syntactically, Go is very succinct. Indeed, the Go style guidelines encourage you to write code as tersely as possible. At first this drove me up the wall, since I was used to using long descriptive variable names and spreading expressions over as many lines as possible. But now I appreciate the terse coding approach, as it makes reading and understanding the code later much, much easier.

Personally, I really like coding in Go. I can get to the point without having to write a bunch of boilerplate just to make the compiler happy. Unlike C++, I don't have to split the logic of my code across header files and .cc files. Unlike Java, you don't have to write anything that the compiler can infer, including the types of variables. Go feels a lot like coding in a lean scripting language, like Python, but you get type safety for free.

Our Go-based rewrite is 121 Go source files totaling about 21K lines of code (including comments). Compare that to the original system, which was 1400 C++ source files with 460K lines of code. (Remember what I said about the new system implementing a small subset of the new system's functionality, though I do feel that the code size reduction is disproportionate to the functionality reduction.)

What about ramp-up time?

Learning Go is easy coming from a C-like language background. There are no real surprises in the language; it pretty much makes sense. The standard libraries are very well documented, and there are plenty of online tutorials. None of the engineers on the team have taken very long at all to come up to speed in the language; heck, even one of our interns picked it up in a couple of days.

Overall, the rewrite has taken about 5 months and is already running in production. We have also implemented 3 or 4 major new features that would have taken much longer to implement in the original C++ based system, for the reasons described above. I estimate that our team's productivity has been improved by at least a factor of ten by moving to the new codebase, and by using Go.

Why not Go?

There are a few things about Go that I'm not super happy about, and that tend to bite me from time to time.

First, you need to "know" whether the variable you are dealing with is an interface or a struct. Structs can implement interfaces, of course, so in general you tend to treat these as the same thing. But when you're dealing with a struct, you might be passing by reference, in which the type is *myStruct, or you might be passing by value, in which the type is just myStruct. If, on the other hand, the thing you're dealing with is "just" an interface, you never have a pointer to it -- an interface is a pointer in some sense. It can get confusing when you're looking at code that is passing things around without the * to remember that it might actually "be a pointer" if it's an interface rather than a struct.

Go's type inference makes for lean code, but requires you to dig a little to figure out what the type of a given variable is if it's not explicit. So given code like:
foo, bar := someFunc(baz) 
You'd really like to know what foo and bar actually are, in case you want to add some new code to operate on them. If I could get out of the 1970s and use an editor other than vi, maybe I would get some help from an IDE in this regard, but I staunchly refuse to edit code with any tool that requires using a mouse.

Finally, Go's liberal use of interfaces allows a struct to implement an interface "by accident". You never have to explicitly declare that a given struct implements a particular interface, although it's good coding style to mention this in the comments. The problem with this is that it can be difficult to tell when you are reading a given segment of code whether the developer intended for their struct to implement the interface that they appear to be projecting onto it. Also, if you want to refactor an interface, you have to go find all of its (undeclared) implementations more or less by hand.

Most of all I find coding in Go really, really fun. This is a bad thing, since we all know that "real" programming is supposed to be a grueling, painful exercise of fighting with the compiler and tools. So programming in Go is making me soft. One day I'll find myself in the octagon ring with a bunch of sweaty, muscular C++ programmers bare-knuckling it out to the death, and I just know they're going to mop the floor with me. That's OK, until then I'll just keep on cuddling my stuffed gopher and running gofmt to auto-intent my code.

ObDisclaimer: Everything in this post is my personal opinion and does not represent the view of my employer.

Startup Life: Three Months In

I've posted a story to Medium on what it's been like to work at a startup, after years at Google. Check it out here.