When Can AI Help Tune Machine Learning? A New Study Shows It's Not Ready Yet

When Can AI Help Tune Machine Learning? A New Study Shows It's Not Ready Yet
A new research paper posted online in June 2026 asks a practical question: can an AI system called a large language model, or LLM, do a better job than traditional computer methods at finding the best settings for machine learning systems?
The short answer from the research is no — at least not when the task is narrowly defined.
What Problem Does This Solve?
Building machine learning systems requires lots of small decisions about how to train them. Think of it like cooking: you need to decide on heat level, cooking time, and ingredient amounts. In machine learning, these "settings" are called hyperparameters — things like how fast the system learns and how much data it processes at once.
For years, mathematicians have built specialized algorithms to find the best combination of these settings automatically. They work by testing many combinations, learning from what works, and narrowing in on better answers.
A group of researchers led by Fabio Ferreira tried something different. They gave an LLM — the kind of AI that powers chatbots — direct access to the training code itself. Instead of just choosing numbers within a fixed range, the AI could read and edit the actual code. In theory, this gives it much more freedom to make changes.
What Did They Find?
When they set up a fair comparison — requiring the AI to stay within the same range of settings that the traditional methods used — the traditional methods won. Two established algorithms called CMA-ES and TPE consistently found better settings with fewer tries.
Why does this happen? The traditional algorithms have a mathematical map of the problem space. They use it to guess which settings are worth trying next. The LLM, by contrast, doesn't have a map. It makes educated guesses based on patterns it learned during training — which sounds smart, but isn't as reliable for this particular job.
Why the Code Access Matters
The setup in this study is different from just asking an AI "what settings should I use?" Giving the AI permission to edit actual code is a bigger capability. In theory, the AI could add new features to the training process, switch between different optimization techniques, or restructure how the system works.
But that extra freedom didn't help. Within the strict bounds the researchers set, the ability to modify code actually seemed to add noise instead of useful changes. The traditional methods, because they understood the boundaries of the problem clearly, made better use of each attempt. This follows a pattern seen many times before in computer science: when you know the exact shape of a problem, that knowledge is valuable.
The broader context worth considering is that the researchers deliberately limited the search space to create a fair test. The real question for the future is whether AI might perform better in messier, less clearly-defined scenarios — situations where traditional methods struggle because nobody has neatly mapped out all the options. That territory might be where an AI system's flexibility becomes an advantage rather than a liability.
How This Fits Into Larger Technology Trends
This result fits a pattern that has repeated many times over decades of AI research. When a new and powerful technology arrives, there is initial excitement about how much it can do. Then comes careful testing that reveals where it genuinely excels and where older methods still win. Deep learning networks beat traditional techniques on some problems but not others. Neural networks designed to create their own designs promised full automation but ended up working best with human guidance. Now LLM agents are going through the same calibration.
That is not bad news. It is how technology actually advances — not with one method replacing all others, but through figuring out where each tool does its best work.
What This Means for People Building AI Systems
For engineers who run machine learning systems at large companies, the practical advice is straightforward: stick with the established methods like TPE and CMA-ES for finding the best settings. They will get you better results per attempt, and they are reliable and reproducible.
The more interesting possibility is combining both approaches. An AI system could handle the big-picture planning of experiments, while a traditional algorithm handles the detailed search for settings. The code from this study makes that kind of hybrid approach easier to test.
The study does not declare that AI systems have no role in machine learning automation. Instead, it draws a clearer line around where they do and do not help right now. In a fast-moving field where hype often outpaces reality, that clarity is genuinely valuable.


