A few further thoughts on what’s happening in the larger AI world, and what improvements I’ve seen over the past few months.
*
Since the launch of ChatGPT-5, here’s what I’ve noticed from my own use and from “expert” chatter:
GPT-5 Automates More:
GPT-5 can now take on more complex tasks with minimal input. It will proactively suggest next steps and complete end-to-end workflows, but you still have to think carefully about your prompt/instructions, or you risk just getting a deeper rabbit hole to crawl out of.
GPT-5 Is a “System,” Not a “Model”
ChatGPT-5 acts as a router, deciding which model to use based on your request. The model can "think" through problems when given time, outperforming previous models when using a “Reasoner” configuration, BUT it’s often routing users to the lightweight models, so it’s hard to see this unless you explicitly ask for “thinking.” This routing affects both Microsoft Copilot and ChatGPT, which was one of the main reasons people were frustrated at the launch (besides just being sad their robot friends were no longer as nice to them.)
AI in The Utopia of Rules
A Stanford study shows that entry-level job postings are declining in sectors where AI replaces rather than assists workers, such as coding, and, in hiring, an LLM voice recruiter outperformed humans in both efficiency and candidate satisfaction for customer service roles.
Number Goes Up!
AI continues to get better across important benchmarks like solving logical reasoning problems and maintaining creative diversity in writing. New models perform tasks faster and more accurately than just a year ago, achieving or exceeding benchmarks like ARC-AGI, which experts say shows continuous exponential growth.
NB this storytelling study that purportedly refutes the “hypothesis that”…
across many domains, LLMs suffer from a kind of creative ‘mode collapse’, and operate in a narrower space of ideas than humans. This gives rise to a concerning prognosis of human-AI collaboration: even as LLMs increase individual productivity and perceived creativity, used collectively they collapse individual viewpoints into a homogenized group-think reflective of the LLM’s own worldview.
More on that to come in a future newsletter.
Do We Believe These Energy Numbers?
The industry is claiming the energy and water required to generate AI responses are decreasing. ChatGPT now uses about 0.0003 kilowatt-hours and 0.38 milliliters of water per prompt, comparable to a “Google search in 2008,” though that’s not universally agreed-upon, and, like many tech company claims, I think we should wait and see.
On Sycophancy
The most human thing about Large Language Models is that the best way to modify their behavior is through Reinforcement Learning.
Reinforcement Learning in AI operates on the theory that instead of relying solely on spoon-fed correct answers, the models can learn the process of finding “correct” answers through trial and error, gradually improving their answering behavior based on this feedback.
The Magnificent Seven tech companies know this method works with “intelligent” systems because they’ve used it very effectively to modify human behavior through machines that induce compulsive loops of reward-seeking, like slot machines and smart phones.
These same monetizing people are, in fits and starts, optimizing their LLMs for engagement, and, through re-enforcement learning, the LLMs have learned the “correct” response to any query is the four most engaging words in the English Language:
You. Are. SO. right!
Humans can never get enough of this, it seems, which is why a lot of teams are having iffy experiences integrating AI into their work. When the AI systems are being driven by people who . . . how to put this delicately? People who are used to having their ideas praised?
Yes, let’s try that.
When the AI is being driven by people who are used to having their ideas praised rather than challenged, the experience for teams looks like this:
A person with a lot of org power but not a lot of implementation experience has an idea.
This idea is how the org can solve a long intractable problem, and the idea came about when this powerful person simply (trigger warning) “asked ChatGPT….”
The newly-enlightened powerful person then calls a meeting and says, “We just need to [insert improbable or obvious idea here]!”
The dead-eyed team then smiles and, as one, moves their cursors over to the open LinkedIn tab on their browsers and says, “Great Idea, Boss!”
And lo when MIT asks, “Is AI boosting productivity or profits?” the answer from the C-Suite is: “Nope.”
So anyways, be extra careful how you build-out systems. Anyone promising full-automation should be kept at arms length. Also, be very wary of sycophantic AI interactions with people who have a lot of org power but not a lot of implementation experience. You might just end up mission-creeping your way into trying to solve the secrets of the universe.