I’m an AI engineer but I don’t trust artificial intelligence yet: here’s what we should do to change it

LLMs have been plagued by hallucinations from the very start. Developers are investing huge amounts of money and time into improving these models, yet the problem remains: hallucinations are rife. And in fact, some of the newest models – as OpenAI confessed to on its recent launch of o3 and o4-mini – hallucinate even more than previous ones.

Not only do these programs hallucinate, but they also essentially remain ‘black boxes’. Hallucinations are hard to defend against, because they are the result of random chance. The answers simply seem plausible, serving some basic use cases, but requiring extensive human oversight. Their hallucinations remain imperceptible to non-subject matter experts.

These two problems present major barriers to AI’s widespread adoption, especially in regulated industries like law and healthcare where accuracy and explainability are paramount. It’s ironic, since these industries are at the same time often the most likely to benefit from software that can automate information processing at scale. So if current models are failing to overcome these barriers, where can we go from here?

Elliot Burke Perrin

VP of Engineering at UnlikelyAI.

Why most AI is fundamentally untrustworthy, and getting worse

Large Language Models, or LLMs, have taken the world by storm over the past few years. This type of software uses predictive algorithms to produce outputs in response to inputs in the form of text. They’re incredible pieces of technology, but nobody knows exactly how they produce specific outputs. The answers they produce simply happen to satisfy our requests… until they don’t.

Since LLMs use statistics to determine their outputs, they occasionally come up with answers or responses that are incorrect. Just as when somebody bets on a horse in a race, even if they were to account for all the variables that could affect all of the competitors’ performances, they’ll occasionally be wrong. When LLMs do this, we refer to it as a ‘hallucination’.

Hallucinations are inherent to LLMs; one cannot have an LLM without them, since they’re statistically prone to them. And because LLMs do not truly understand the information they receive and produce, they’re unable to notify users when they do it. That’s problematic for everyone, but especially so in applications where the stakes are much higher: in law or healthcare, for example.

What symbolic reasoning is, and why it’s key to reliable AI

As OpenAI has essentially just confessed, nobody knows how to solve this problem using current generative AI models. There is, however, a way to solve it using another model: a type of AI that uses ‘symbolic reasoning’ to address the faults inherent to LLMs.

Symbolic reasoning is an old, well-established method for encoding knowledge using clear, logical rules. It represents facts as static pieces of knowledge, meaning that it’s not possible for software to manipulate or interpret them incorrectly. It’s the same kind of technology that allows us to perform calculations and run formulae on spreadsheet software like Microsoft Excel (people don’t check twice to see if the calculation is correct or not). Symbolic systems prove their trustworthiness through determinism – the same inputs to a symbolic system should always produce the same outcome; this is something an LLM could never guarantee.

Unlike LLMs, symbolic AI allows users to see exactly how it has made a decision, step by step, without hallucinating the explanations. When it doesn’t understand the input, or can’t calculate the answer, it can tell the user so: just as when we receive error messages on Excel if a formula is input incorrectly. This means that symbolic systems are truly transparent and traceable.

How neurosymbolic models could be the future of enterprise-grade, auditable AI

The reason why we don’t just use symbolic models for generative AI programs is because they’re not particularly good at processing language. They lack the flexibility of LLMs. Each model has its own strengths and weaknesses.

The solution to this, then, is to combine the strengths of both to create a new category of AI: ‘neurosymbolic AI’. Neurosymbolic AI benefits from both the rules-based features of symbolic AI and the flexibility of the neural networks that underpin LLMs. This allows users to perform functions that process unstructured information in documents, while following a formula that provides structured answers the software is able to explain.

This development is crucial to the adoption of effective AI within business, but especially in heavily-regulated industries. In those contexts, it’s not good enough to say that a certain outcome has been generated and we don’t know how the program has come to that answer, but it looks about right. It’s imperative, above all, to understand how the program has come to its decision. That’s where neurosymbolic AI comes in.

What’s special about doing things this way is that neurosymbolic AI will admit when it cannot produce an accurate response. LLMs don’t, and will often produce convincing answers anyway. It’s easy to see how this can be hugely useful in insurance, for example, where a neurosymbolic AI program could automatically process claims, flagging cases to trained humans when it’s unsure of the suitable outcome. LLMs would just make something up.

It’s time for us to recognize that, while they’ve certainly pushed the technology forward, our current models of AI have reached an insurmountable wall. We need to take the lessons from the progress we’ve made and seek other solutions that will allow us to approach from a different angle. The most promising of these solutions is neurosymbolic AI. With it, we’ll be able to foster trust in a technology that, in its current format, has none.

We’ve featured the best AI website builder.

This article was produced as part of TechRadarPro’s Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

https://cdn.mos.cms.futurecdn.net/Wcc69A4Ts8bhSbGgJeGkoZ.jpg

Source link

Android lovers finally get their taste of the Sora app, and now we’ll see twice as much AI slop

How AI improves quality assurance and operational reliability

Dyson Hot+Cool HF1 review: an impressively smart little fan heater, with a couple of drawbacks

Bang & Olufsen’s new elite Dolby Atmos soundbar looks like it’s come from the future, and is priced like it too

Ex-Israeli Intelligence Official: Shockwaves of Trump’s “Take Over Gaza” Heard, Felt Across Region

What UK political parties are promising in the 2019 general election

Otto Warmbier’s parents want North Korea to suffer for their son’s death

Could a ‘youthquake’ cause Boris Johnson to lose the general election?

Which Celebrity Styles Americans Copy Most in 2025: New Study

New ‘Westworld’ trailer introduces us to another dystopian tech company

What’s the point of ‘Charlie’s Angels’ without Sam Rockwell dancing?

These striking photos capture the future of human flight

Enterprise Products Partners' SWOT analysis: midstream giant's stock resilience tested

JetBlue's SWOT analysis: airline stock faces turbulence amid strategic shifts

Minnesota lawmaker killed on Saturday served with compassion, governor says

Minnesota shooting suspect told friend in text message: I might be dead soon

The YouTuber who has become one of Gen Z’s most beloved celebrities

26 last-minute holiday gifts that are still thoughtful and unique

Practicing gratitude regularly can make you less stressed and sleep better

8 things millennials wish you would just stop getting them for the holidays

I’m an AI engineer but I don’t trust artificial intelligence yet: here’s what we should do to change it

Android lovers finally get their taste of the Sora app, and now we’ll see twice as much AI slop

How AI improves quality assurance and operational reliability

Dyson Hot+Cool HF1 review: an impressively smart little fan heater, with a couple of drawbacks

Bang & Olufsen’s new elite Dolby Atmos soundbar looks like it’s come from the future, and is priced like it too