Polymath Engineer Weekly #48

Links to make your week more enjoyable

May 12, 2023

Hello again. Here we go again:

Links of the week

Improving Incident Recovery By using The SLI Pyramid

The bottom line is that it’s “easy” to mitigate an incident when the mitigation process fixes the underlying issue and returns the system to its correct state. But when the mitigation process puts your system in a different bad state, it’s much harder to decide whether to go down this mitigation path.

The Paradox of “Easy Startups”

Big Missions bring in big talent.
It’s not fashionable to say so, but some people and some teams are 100x, 1000x, or 10000x as productive as others. Recognize this, and recognize what those people want—to put their skills and efforts into the most challenging and most impactful work in the world with other people like them.
And when you have a big big mission — something visionary that inspires people, you can hire missionaries, not just mercenaries. People work smarter, longer, harder and (maybe) in tighter unison in service of a shared mission.

The Leadership Myth in Replicated Databases

However, understanding the difference between these two different leadership roles is very useful in a cloud setting. With the right layering, you can disaggregate your log layer from your database and scale it independently; switch your database from single-primary to multi-primary without changing the consensus protocol; change your consensus protocol to be leaderless without disturbing your database layer, and so on. For a more technical description of these ideas, see the Delos papers from Meta.

A Literate Assembly Language

Since I sometimes develop new CPU architectures, I have a universal cross assembler that is, honestly, an ugly hack, but it works quite well. I’ve talked about it before, but if you don’t want to read the whole post about it, it uses some simple tricks to convert standard-looking assembly language formats into C code that is then compiled. Executing the resulting program outputs the desired machine language into a desired file format. It is very easy to set up, and in the middle, there’s a nice C program that emits machine code. It is not much more readable than the raw assembly, but you shouldn’t have to see it. But what if we started the process there and made the format readable?

The end of a myth: Distributed transactions can scale

Remote direct memory access (RDMA) allows bypassing the CPU when transferring data from one machine to another. This helps relieve a major factor in scalability of distributed transactions: the CPU overhead of the TCP/IP stack. With so many messages to process, CPU may spend most of the time serializing/deserializing network messages, leaving little room for the actual work. We had seen this phenomena first hand when we were researching the performance bottlenecks of Paxos protocols.

▶ Enter PaLM 2 (New Bard): Full Breakdown - 92 Pages Read and Gemini Before GPT 5? Google I/O

Google puts it foot on the accelerator, casting aside safety concerns to not only release a GPT 4 -competitive model, PaLM 2, but also announce that they are already training Gemini, a GPT 5 competitor [likely on TPU v5 chips]. This is truly a major day in AI history, and I try to cover it all.
I'll show the benchmarks in which PaLM (which now powers Bard) beats GPT 4, and detail how they use SmartGPT-like techniques to boost performance. Crazily enough, PaLM 2 beats even Google Translate, due in large part to the text it was trained on. We'll talk coding in Bard, translation, MMLU, Big Bench, and much more.

Book of the Week

Systems Performance

Do you have any more links our community should read? Feel free to post them on the comments.

Have a nice week. 😉

Have you read last week's post? Check the archive.

Polymath Engineer Weekly

Polymath Engineer Weekly #48

Links to make your week more enjoyable

Links of the week

Book of the Week

Discussion about this post