Information Compression

Why the wrong thing happens

This is a memo I sent to the Scale team in 2019 which has meaningfully changed how many people at the company have operated. I hope it is similarly useful to those not at Scale:


One oft-cited reason organizations get less efficient as the number of people increases is “communication overhead". It’s a simple model. As a team grows, you spend a greater percentage of time communicating (it scales with n-squared), so you become less efficient.

While it might be true, I think the far more salient reason is that information compression is incredibly hard.

Unfortunately, we’re all bad information compressors—it’s part of being human.

The real world is complex. For almost anything, there is some grisly reality filled with nuance, nonsensical complications, and randomness (“what’s really going on”). But, when we communicate with each other, we usually speak in a handful of complete thoughts. When Person A has to communicate a complex idea to Person B, she invariably is going to leave out a lot of the details and say something accidentally too simple. This process of translating the mental image of a complex system to something human-readable (usually words or pictures) is information compression, and what Person B then understands from that message (“the compression”) is the decompressed image.

Ideas, which are often very complex, have to pass through the very narrow window of human language, and often all the nuance is lost.

This process is almost never successful. By the end, Person B’s decompressed image is almost always a deeply flawed recreation of “what’s really going on”. Person B is then unable to address “what’s really going on” based on her flawed decompressed image. Worse yet, Person B then does work thinking she’s solving the problem without actually solving it.

This inefficiency is far more costly than the simple increase in communication overhead, and you see this very saliently in big companies. Most people are doing work that doesn’t matter at all, and either they don’t know, or they know and can’t do anything about it.

Put another way, nobody is telling the full truth. Mostly by omission, and mostly unintentionally.

What would it mean for this process to be successful? It’s usually impossible for Person B to truly understand “what’s really going on”, but in the best outcomes, Person B understands “what matters most”. They know the needle movers that will really make a difference in “what’s going on”, or in a more mathy framing, they understand the vectors upon which the gradient is the steepest (“features”).

Reliably good compressions are really valuable, because then Person B can make a tangible impact on the problem at hand.

There’s a “customer framing” of this phenomenon. There’s a “customer problem” (customer loosely defined here; it could be anybody who you’re trying to help), and their compression of that customer problem is simply what they ask for. What they ask for is rarely what they really want. Most of the time, what ends up getting built neither really solves the customer problem, and what’s worse it probably took way longer than “optimal”, i.e. the simplest possible solution to the actual customer problem.

We have all seen this gone wrong many many times, and is probably going wrong right now in 5 different places within Scale. Unless the problem solver is incredibly curious and spends a lot of time asking more questions (getting a better compression), the “wrong” thing will almost always be built.

Note: Here, “wrong” means very far from optimal on the efficient frontier of “how easy something is” and “how much it makes things better”.

One tax that’s difficult to appreciate, but extremely costly, is the tax of building solutions that are very complex and solve the problem poorly. The most common source of this is a non-coder asking a coder to do something that they think is easy, but turns out to be not very easy.

Where does this go wrong? Let’s look at what happens in Person B’s head, the decompression function:

The decompression function combines a few signals together:

  1. the compression / message

  2. the shared context, namely what Person A and Person B both understand about “what’s really going on” because of past shared experience

  3. the prior, representing all of Person B’s past experience and philosophies that shape how they interpret the information.

You can start to see why a small startup (<5 people) can be so efficient. Everybody is seeing everything else that’s happening, both internally and with the customer. So, the shared context is very strong, both internally and with the customer. The more subtle thing is that most people early at a startup are ideologically and experientially similar, so their priors are very similar. That means they only need a few bits to communicate complex ideas.

On the other hand, in a more “silo’d” company, when “handoffs” happen, there’s generally very little shared context, nor are the priors generally shared between individuals on separate teams. This implies two important laws of growing companies that I deeply believe:

  1. Vertical integration is always better if you can pull it off, minimizing handoffs.

    [sidenote: this is a big reason why Scale has organized our engineering teams to be verticalized. The costs of these limited-context handoffs was insidiously adding up]

  2. A strong culture is absolutely necessary for efficiency to maintain shared priors.

What’s the antidote to this issue? The best solution is to solve your own problems. Avoid handing off. If you deeply understand the problem, then all of a sudden the information bottleneck goes away. Your brain can simply store all the latent complexity of the problem, and our brains happen to be much better than words or pictures at this.

If you spend the time to actually feel the pain of the problem and deeply understand it, then you don’t need to rely on some lousy compression and you can solve the problem head-on.

There’s many forms of this,

  • Dogfooding is very important. Everyone at Scale should spend time labeling data and training a model.

  • Curiosity about the customer problem is arguably most important thing while selling. They won’t tell you outright enough information because all information compressions are bad. You need to really fill in the gaps in your understanding to be able to reconstruct an accurate decompressed image.

  • Coders should learn to do other things and vice versa.

  • Get in the weeds. Don’t be afraid to spend time gaining a huge amount of latent context to properly understand your problems.

  • Hiring from your customers can be incredible for your product because they bring a lot of context on how you can do things better.

The solution to these information compression woes are tight-knit teams who bring the customer as close to the code as possible. The best results always come from a tight marriage between the communications with the customer and the code that gets written. In the best case, these are all the same person; at the very least, these should happen in the same room with the same people.

The ultimate hack is to not rely on compressions at all. Be an engineer, salesperson, support representative, marketing person, operations associate, and develop uncompressed understandings of how everything works together. That has been the key to every great product I know of.

Share

Thank you to Sam Altman, Tim Junio for reading drafts of this post.