Blocking Go Readers and Writers

TL;DR: Go Readers/Writers are blocking, and you should be aware of it when using them.

Go Readers and Writers have pretty slick interfaces.
In two sentences, a really naive way of implementing a reader would be the following:

type dataHolder struct {
  data []byte
}

func (d *dataHolder) Read(p []byte) (int, error) {
  i := copy(p, d.data)
  return i, nil
}

The writer bit can also have a very naive implementation:

func (d *dataHolder) Write(p []byte) (int, error) {
  i := copy(d.data, p)
  return i, nil
}

And, to call our code:

func main() {
  data := make([]byte, 11)
  rw := &dataHolder{make([]byte, 11)}

  _, _ = rw.Write([]byte("hello world"))
  _, _ = rw.Read(data)
  fmt.Println(string(data))
}

Read takes a []byte, and writes it’s data to it. Write takes a []byte too, reads and stores it’s data.

Note: this code bit is a sample to show how readers/writers work. You should almost never have to call Read manually. Instead, use ioutil.ReadAll

For more information on readers and writers, Ben Johnson’s Walkthrough of the io package is an awesome resource.

Back to our issue

The subject of this post is not an introduction to readers and writers though. It’s about the fact that they are blocking, and how can that can cause issues.

Because an entire reader can hold more data than available in len(data), the Read method can be called multiple times, and each time will pick up where the previous one picked up. It will stop when an io.EOF is returned as an error.
It can also take a while until all the data is read (if a network call is required for example). And Read is not supposed to send 0 bytes of data.

So you could end up with a call to Read taking several seconds, until it has data coming. And the fact that you can’t read from a Reader twice.

A simple example

The following example details how busl, one of the Heroku components works, uses readers and the reader blocking issue we’ve been facing.

busl

busl is an HTTP pub/sub app built to run on Heroku.
The publisher can send any bytes of data, they will be streamed back to the consumers in text/stream or SSE format. Once the publisher closes the stream, it is archived and can still be retrieved by the consumers.

We use it to capture logs from Heroku builds, CI and release phase. This is how you can view them in the git push output, the dashboard or to be streamed by any API client.

Handling reconnections

Just like any other Heroku application, whenever it is deployed or needs to be restarted, it will receive a SIGTERM and have 30 seconds to shutdown before being force-killed.
Unfortunately, out streams can last much longer than that. Builds can run for up to 60 minutes, release phase commands can run for up to 24 hours.

So we need to let producers and subscribers disconnect and retry if the stream hasn’t finished yet.

This transport includes the logic behind producer reconnects. On a high level, it does the following:

  • The running command pipes it’s stdout/stderr to the producers’ stdin.
  • The producer opens an HTTP request to the busl stream, and sends the bytes it receives.

When it gets disconnected, it does the following:

  • Reopen the connection
  • Resend all the data it previously received*
  • Keep sending the data, until it gets disconnected or the command finished running.

*We don’t know if the server properly received all the sent bytes or not. So instead of making an additional request to fetch the number of bytes previously received, we resend everything and the server discards everything it already received.

Our first approach was to have a reader reading stdin, forwarding that data back to the server and storing it in a file buffer.
Then on disconnect, a new reader would start reading stdin, send the buffer and move back to sending the live data.

Using a MultiReader and a TeeReader, this was pretty straightforward

Unfortunately, that didn’t work (or this article wouldn’t exist right).

Blocked readers

As mentioned earlier, it can take a while until the reader actually receives data.
If we lose the connection while the reader is idling, we open a new connection with a new reader reading from the buffer, then moving on to the live data.

Except we’re still reading from stdin in our blocked reader. and stdin is a reader like any other, meaning we can’t read from it twice.
So if we have two readers trying to read it at the same time, only the first one will get the data. :boom:

As soon as we get some data, it can be sent to the old reader instead of the new one. We can get that old reader to send an EOF when it gets that data. But we still need to idle until we receive something as we’re blocked.

The effect is that we can lose some bytes of data every time we get a disconnection.

Our solution

After trying to fire fight on this for a few days, we decided to take an entirely different approach.

Instead of streaming the data directly to the client, we store it directly on a buffer file.
That way, we don’t care about disconnections when receiving output data. It is stored in a local file, which won’t have connectivity issues.

Then, each connection reads from that file, The trick being that it should ignore EOF until we tell the stream to be done.