Polymath Engineer Weekly #98
Autoencoders, DuckDB, Python's Garbage Collector, Terminals, Go Iterators, PostgreSQL Vectors and Skin in the Game
Hello again.
Comic of the week
Links of the week
Deep Dive into Anthropic’s Sparse Autoencoders by Hand ✍️
Sparse Autoencoders help break down the problem of ‘polysemanticity’ — neural activations that correspond to several meanings/interpretations at once by focusing on sparsely activating features that hold a single interpretation — in other words, are more one-directional.
DuckDB Doesn’t Need Data To Be a Database
Databases have gotten so good at this, that the term is almost misleading now. “Base” suggests something rigid, without which the data would slip away. But the data is always there, just bits on a nameless hard disk. The structure and the accessibility that a modern database provides exist completely independently from that hard disk. That’s right – most databases no longer have any data in them.
CPython Garbage Collection: The Internal Mechanics and Algorithms
The Python code executes on the interpreter in the context of an OS thread, and the runtime maintains the information about the thread’s state in an object called thread state. The thread state contains information about the code executing on the interpreter in the context of that thread, such as the stackframe pointer, the GC state, and the eval_breaker.
Do you want to be the first to know about new posts? Subscribe now for FREE. Just cool content, no SPAM!
The TTY subsystem is central to the design of Linux, and UNIX in general. Unfortunately, its importance is often overlooked, and it is difficult to find good introductory articles about it. I believe that a basic understanding of TTYs in Linux is essential for the developer and the advanced user.
Go evolves in the wrong direction
Since Go1.23, the for ... range loops can be applied to functions with special signatures (aka pull and push functions). This makes impossible to understand what the given innocent for ... range loop can do under the hoods by just reading the code. It can do anything, like any function call can make. The only difference that the function calls in Go were always explicit, e.g. f(args), while for ... range loop hides the actual function call. Additionally, it applies non-obvious transformations for the loop body
How We Made PostgreSQL as Fast as Pinecone for Vector Data
The DiskANN algorithm was developed by work coming out of Microsoft. Its goal was to store a very large number of vectors (think Microsoft scale). At that scale, it was simply uneconomical to store everything in RAM. Thus, the algorithm is geared towards enabling storing vectors on SSDs and using less RAM. Its details are described very well in the paper, so I’ll only give a bit of intuition below.
I have created a survey to get feedback from you. It takes only 2 minutes.
Book of the week
Skin in the Game: Hidden Asymmetries in Daily Life
Do you have any more links our community should read? Feel free to post them on the comments.
Have a nice week. 😉
Have you read last week's post? Check the archive.