Aaron Dong

22.10.25 The clear push (or at least the current phase of experimentation) by today's major players is an agentic experience centered around the chat interface. We are likely to see a shift towards voice agents or voice browser control by mid-2026. I remain skeptical if either of these will remain as viable long-term usage patterns. It seems like a contrived way of leveraging the general capacities of contemporary models to justify capital expenditure on compute. Like it has always been, having good, intuitive software for humans means that you shouldn't need a vision language model to help you navigate or a large language model to make ten tool calls to complete some action for you. The overwhelming question and frantic pushes are made in the name of productization of large language models, with rapid sloppy experimentation by fractional trillion dollar companies. The artifice that AI labs have previously maintained about superintelligence and knowledge worker automation has silently slipped.

25.08.25 The artificial superintelligence and knowledge worker replacement (or 10x improvement) narrative that a subsect of the scientific and speculation industry has pushed for the past few years has become factually untenable. However what is true is that we have underwent an irreversible cultural shift in how we access and create information. People, including myself, have started delegating information discovery and processing to text generators. Language models have become a very useful tool for accessing information in a way that is personalised and easy to understand. They can help individuals more easily complete knowledge work outside of their domain of expertise. They have cognitive deficits: no creativity, little capacity for lateral thinking, overreliance on memorisation and brute force pattern recognition rather than abstract reasoning (higher level persistent abstractions beyond language). They can be useful informational companions but are unable to autonomously complete novel work. In assisting users in completing novel work they can be useful but require cognitive demands from the user that may be in excess of their value creation, especially if creating high quality novel content.

15.08.25 Large language model development is about making smaller, efficient models. Model distillation, inference optimization and parameter efficiency are being addressed by open source small language models (e.g. Gemma, Llama, DeepSeek, Qwen). As the tooling and data formats become more standardised, we'll get more accessible and powerful local models. When trying to push language models to be smaller, we may find interesting emergent properties as optimisation and connections are made to de-prioritise memorisation. However, no matter what scale a language model is at, instruction-following without explicit program synthesis might never be reliable. I would caution against over-reliance on agents. Just find or make good software.

12.08.25 The era of language model scaling is over. It is self-evident that no more gains in utility are possible through making models bigger. We need to look to different data sources, architectures and ideas that will take time and patience to mature. Agents are an attempt towards real-world applications but are inevitably constrained by the reliability of the underlying language model. Being easily verifiable is necessary but not sufficient for a suitable task for agents. If a task is easily verifiable, there is often low value in its completion or it may be better suited to traditional program synthesis. If a task is not easily verifiable, the human effort involved in verification often requires the same or more expertise and effort than the task itself. Test-time training is the new frontier if we are to unlock the ability to make generalisable systems. But it may not be as simple as fine-tuning language models based on reward signals. Our current models have no lasting memory and memorisation of language patterns is interwined with its cognitive function.