Polymath Engineer Weekly #74
ChatGPT, Production Code, Hashing, SMTP, Junk, Profilers and a Giant Hairball
Hello again. Enjoy the last post of this year ;)
Links of the week
Pushing ChatGPT's Structured Data Support To Its Limits
Structured data and system prompt engineering saves a lot and time and frustration for working with the generated text as you can gain much more determinism in the output. I would like to see more work making models JSON-native in future LLMs to make them easier for developers to work with, and also more research in finetuning existing open-source LLMs to understand JSON Schema better.
We Have To Support Every Line of Production Code Forever
Regardless of whether a big customer pays us for a one-off enhancement or we give it to them for free, our responsibility is clear: this bit of code needs to work as promised, and continue working as promised, for as long as that customer has it in production. If it breaks three years from now — or has to change to work with changes in their other various systems — we’ll be expected to fix/change/adapt/improve it to meet purpose. Most enterprise systems last for 7-10 years: that means 7-10 years of having someone on the product staff who knows it exists and someone on the technical staff who understands it enough to make repairs/improvements.
But what if we take a nominally secure hash like SHA-256? Can we attack that, without exploiting any structural weaknesses in the hash function itself?
The short and boring answer is "no, SHA-256 is secure." But what if we consider a weakened version? How much weaker do we have to make it before we can generate collisions? There's more than one way to weaken a hash, but in this article we'll consider truncation: throwing away some portion of the hash digits.
SMTP Smuggling - Spoofing E-Mails Worldwide
By exploiting interpretation differences of the SMTP protocol, it is possible to smuggle/send spoofed e-mails - hence SMTP smuggling - while still passing SPF alignment checks. During this research, two types of SMTP smuggling, outbound and inbound, were discovered. These allowed sending spoofed e-mails from millions of domains (e.g., admin@outlook.com) to millions of receiving SMTP servers (e.g., Amazon, PayPal, eBay).
About five years ago, for seven dollars, I bought an old citrus juicer at a thrift shop. […] The thing worked beautifully, almost like new, so I looked up its serial number on the internet to see when the unit was manufactured, guessing it might be almost forty years old. Wrong. It dated to the 1940s. It was seventy, the stubborn monster, still giving satisfaction with every use.
Performance engineering, profilers, and seeing the invisible
Profilers show you “what’s there:” they show you, very concretely and literally, where a particular execution of a program is spending time, usually in terms of functions or stack frames. This is, certainly, useful information.
However, it’s not enough! Sometimes you stumble on an obvious hot-spot caused by an obvious bug, but in many more cases — especially once a program has been optimized somewhat already — you need additional information to decide what actions to take or optimizations to attempt.
Book of the Week
Orbiting the Giant Hairball: A Corporate Fool's Guide to Surviving with Grace
Do you have any more links our community should read? Feel free to post them on the comments.
Have a nice week. 😉
Have you read last week's post? Check the archive.