Veteran Windows developer Dave Plummer has returned to his garage, a space filled with computers and nostalgia, to unravel the complexities of artificial intelligence. In his latest endeavor, he aims to shed light on what he refers to as AI’s “dirty little secret.” This revelation is intriguingly encapsulated in the opening line of his video description: “Dave uses a PDP-11 to train a real Neural Network complete with Transformers and Attention so you can see them at their most basic.”
Exploring the PDP-11
For context, Plummer showcases his 47-year-old PDP-11 system, equipped with a modest 6 MHz CPU and a mere 64KB of RAM. This vintage machine runs a transformer model named ‘Attention 11,’ crafted in PDP-11 assembly language by Damien Buret. At first glance, the task assigned to the PDP-11 appears straightforward: reversing a sequence of eight digits. However, the model must grasp a structural rule for each input, a process that Plummer argues mirrors the foundational principles behind contemporary large language models (LLMs) like ChatGPT.
Despite working with assembly language, Plummer emphasizes the importance of optimizing for the system’s constraints. He states, “Constraints are not the enemy of engineering. Constraints are what force creative engineering to happen.” What may surprise many is the minimal scaffolding required for intelligence to emerge from such a system. The model operates with just 1,216 parameters, employing fixed-point math and reducing precision to 8-bit for the forward pass. Every cycle is meticulously optimized to ensure the machine completes training before “the heat death of the universe.”
As Plummer observes the training process, he notes, “We’re watching the stripped-down anatomy of learning itself. The model begins dumb. The loss begins high. Accuracy stumbles around like a man trying to assemble IKEA furniture in the back of a moving van. And then somewhere along the way, the weights settle into a pattern. And the attention discovers the reversal map. And the machine crosses that invisible line from guessing into knowing.”
The results from this AI training experiment on the antiquated 6 MHz computer were impressive. Plummer successfully guided the model to achieve 100% accuracy in the number-reversing task after approximately 350 training steps, a feat accomplished in about 3.5 minutes on the PDP-11/44, aided by a cache board. This achievement underscores Plummer’s assertion that modern AI operates on the same mechanical principles—rather than mystical ones—only at a vastly larger scale. “This old machine is not thinking in some mystical sense. It’s just grinding through arithmetic to update a few thousand carefully stored numbers. And that’s the whole game. The glamour of modern AI mostly comes from doing that on a staggering scale. But the essential act of learning is already here fully in miniature,” he explains.
In closing, Plummer suggests that as the demand for computational resources intensifies, any company that embraces an old-school focus on efficiency and optimization could find itself at a significant advantage in the evolving landscape of AI.