News > Opinion

45 min ago

Opinion: Harnessing AI handyman’s tools

By Kate Raynes-Goldie

45 min ago

Opinion, Technology, Innovation & Science

OPINION: This second in a two-part series looks at how to minimise AI hallucinations.

AI agents have been described as similar to the conductor of an orchestra. Photo: Africa Studio

ARTIFICIAL intelligence’s tendency to hallucinate is not going away any time soon. According to the Vectara Hallucination Leaderboard, large language models such as Claude and ChatGPT still produce inaccurate responses in up to 10.8 per cent of cases.

At the same time, AI is being blamed for sweeping layoffs across the tech industry. How can we safeguard accuracy and truth if humans are removed from the process, particularly considering the people building these systems don’t fully understand what hallucinations are or why they happen?

“They are essentially what you’d call black box systems,” said Nofil Khan, founder of Sydney-based AI consultancy Avicenna.

“We’ve designed the algorithms, we’ve given it the data and the output is that it works really well. But we don’t actually know what’s happening in the middle.”

Step one

The first step in reducing AI hallucinations is for individuals to ask whether they need AI in the first place.

“A lot of companies have come to me and said, ‘Hey, we want to implement AI’, and I say ‘Okay, cool, why’? And then there’s no answer. Or they say the competitor is doing it so they have to do it,” Mr Khan said.

As a result, businesses are deploying AI in places where it is neither necessary nor strategically sound.

Agentic AI

While LLMs such as Claude and ChatGPT are designed to create content, agentic AI uses LLMs to manage other tools rather than attempting to do everything itself.

Associate professor of computer science and software engineering at The University of Western Australia, Wei Liu, describes AI agents as being like the conductor of an orchestra.

Along those lines, Mr Khan said agentic AI used a set of tools that, once selected, allowed the user to do precise jobs by selecting the precise tools.

For example, users could create an executive assistant agent that triaged and responded to emails.

While still powered by LLMs, AI agents could help reduce hallucinations due to the many precise, non-LLM tools at their disposal.

AI harnesses

“An AI harness is the system that holds and guides an AI model,” Mr Khan told Business News.

“The harness restricts the model’s ability and allows the model to understand what tools it can use and when to use them.

“If the AI model is the handyman, the harness is the toolbox and the guide that tells it which tools to use and when to use them.”

Harnesses helped reduce hallucinations by making sure the AI did what the user wanted, he said, in addition to being able to check its own work.

RAG

Dr Liu said retrieval augmented generation (RAG) reduced hallucinations by grounding AI responses in approved internal data, such as structured databases or enterprise document collections not usually available on the public internet.

This reduced hallucinations by allowing fine tuning of the AI to make the model more domain-specific, she said, while providing greater user confidence about the LLM’s information sources.

A practical example, Mr Khan said, was that of a law firm that wanted to use its archive of past cases.

“You can ask AI, ‘In our previous cases, note me down all the different situations where xyz happened’,” he said.

“It can then go and sift through hundreds, thousands of cases to try and find the most relevant cases related to this input.”

Garbage in, garbage out

None of these approaches mentioned above can completely eliminate hallucinations. That is why the final, and arguably most important, safeguard is the quality of the data at both ends of the AI process.

At the front end, that meant making sure good data went in, especially when using RAG, Mr Khan said.

“[AI is] only as good as the training data,” he added.

At the back end, human verification remained critical. “At the end of the day, you still need a human to be in the loop,” Dr Liu told Business News.

And beyond verification, there must also be accountability. AI can generate output, but it cannot take responsibility for it.

That still belongs to us.

READ THE FIRST REPORT HERE

• Dr Kate Raynes-Goldie is the chief curiosity officer at The Up Next Company, Oceania’s leading LEGO® Serious Play® expert, curiosity keynote speaker and the creator of SUPERCONNECT. She’s been helping people understand and get curious about innovation, tech and the future since 2002

Opinion: Harnessing AI handyman’s tools

Bookmark

Step one

AI harnesses

RAG

Garbage in, garbage out