Polymath Engineer Weekly #71
LSM trees, Load testing, Go, GTA III, UltraFastBERT, C++
Links of the week
Compaction is critical to systems that use the Log-structured Merge Tree (LSM-Tree) architecture. High-throughput writes due to the log appending mode and constant data flush from the memory to the disk when data exhausts the memory capacity are causing more and more overlap of data ranges and accumulation of data with the same key. Consequently, they have compromised the reading performance and caused space expansion. Therefore, the compaction mechanism was introduced to optimize reading performance and space problems by continuously recycling old data and merging multiple layers into one with periodic background tasks. However, the compaction policy and task scheduling method have become new problems.
One major decision I made was to sacrifice some amount of precision to achieve high scale. Designing distributed systems is all about making tradeoffs given competing non-functional priorities. In single-host mode, TPSGenerator could be 100% accurate. You wanted to generate 23,476.76 TPS for 15 seconds? You got exactly 23,476.76 TPS for 15 seconds. But in multi-host mode, being able to achieve a load in the millions of TPS was more important than that exact precision. A lot of the mechanisms I had in place to self-regulate had several seconds of delay, so you couldn’t run at 4,000,007 TPS for 20 seconds then switch to running at 4,000,009 TPS for 10 seconds. It was optimized for longer running times and large throughputs at the expense of some precision. Tradeoffs in distributed systems!
Below are some useful and versatile code snippets randomly picked from my utilities library, without any particular categorization or system-specific tricks.
It turned out they couldn't decide what the size of the moon should be. 2 of them wanted it smaller to be more realistic. The other 2 wanted it larger to be more cinematic.
This went on a bit and I suggested to make the size of the moon changeable in the game. This way they could decide in their own time and let me know the conclusion. Since I was working on the sniper rifle, I made it so that the moon toggled through 3 sizes (small, medium, large) as the player sniped it.
The artists never got back to me so I just left it in. It was still there in SA.
Language models only really need to use an exponential fraction of their neurons for individual inferences. As proof, we present UltraFastBERT, a BERT variant that uses 0.3% of its neurons during inference while performing on par with similar BERT models. UltraFastBERT selectively engages just 12 out of 4095 neurons for each layer inference. This is achieved by replacing feedforward networks with fast feedforward networks (FFFs). While no truly efficient implementation currently exists to unlock the full acceleration potential of conditional neural execution, we provide high-level CPU code achieving 78x speedup over the optimized baseline feedforward implementation, and a PyTorch implementation delivering 40x speedup over the equivalent batched feedforward inference. We publish our training code, benchmarking setup, and model weights.
This translation layer exists because C++ and its respective AST (Abstract Syntax Tree) as a logic container are not conducive to feed optimization algorithms. Most optimization logic needs a portable, information-rich, quasi-assembly representation where all the side effects are resolved. That is why C++ (and all other languages) get converted to IR (intermediate Representation) before optimizations are performed.
Book of the Week
Do you have any more links our community should read? Feel free to post them on the comments.
Have a nice week. 😉