Organizational Risk to Using Generative AI: Hallucinations in LLM Chatbots

Published: Oct 17, 2023
By: Samuel Karan; Nicholas Otto; Joseph Nguyen; John Seyda
Topics: Artificial Intelligence
Share

Inaccurate facts, stories, and citations generated by artificial intelligence (“AI”) are not uncommon. Many of these fabrications can be revealed via a traditional search engine by a reader or end user who checks their sources. These AI-generated falsehoods have led to negative social and financial consequences for many, notably including ChatGPT’s own creator company, OpenAI.

Examples of Hallucinations Created by AI Chatbots

Such false statements generated from AI are typically referred to as “hallucinations.” While some are minor, regarding something quickly disprovable via the search engine check, hallucinations can be complicated and even impact relatively esoteric queries, such as false legal case citations.

In February 2022, Roberto Mata enlisted the aid of New York lawyer Steven Schwartz to commence legal action against Avianca Airlines for an injury sustained while flying with the airline in 2019 when he had been “

In another high-profile example of AI hallucinations, it was publicly revealed in June 2023 that a defamation lawsuit had been filed against OpenAI by a Georgian radio host after ChatGPT falsely stated that the host had been accused of defrauding a non-profit organization. While the legal foundation for this suit is still being debated, its existence is emblematic of the increasing angst surrounding false or unverifiable information provided by large language model (“LLM”) chatbots, generated by prompts requesting factual information, ranging from legal cases to a radio host’s biography and beyond.

Hallucinations are not a phenomenon confined exclusively to ChatGPT. In February 2023, Google presented a demo of its new chatbot, Bard, demonstrating its ability to answer questions quickly and confidently about the James Webb Space Telescope. While impressive at first glance, it didn’t take long for observant astronomers to point out that one generated factoid (that the telescope “took the very first pictures of a planet outside of our solar system”) was blatantly false. Bard made up the statement and presented it just as convincingly as it did the other statements.

Why AI Chatbots Produce False Information

The causes of these chatbot hallucinations are not entirely understood.

One leading theory is that chatbot hallucinations may occur when the chatbot is asked to retrieve information that it does not have stored in its dataset. Attempting to answer the user's prompt regardless, the chatbot extrapolates on existing data to provide an answer. In attempting to "fill in the blanks" with plausible content, the chatbot provides unreliable and/or counterfactual information. This issue is compounded by the casual conversational tone of AI generators’ outputs, which allow the generator to convincingly deliver incorrect information in the same way that it should deliver correct information.

As OpenAI CEO Sam Altman tweeted in December 2022 in reference to ChatGPT’s capabilities, “it does know a lot, but the danger is that it is confident and wrong a significant fraction of the time.”

ChatGPT’s introductory states that “ChatGPT sometimes writes plausible sounding but incorrect or nonsensical answers,” a clear reference to hallucinations. OpenAI provides three reasons why solving these hallucinations is challenging:

During the reinforcement learning (“RL”) training for ChatGPT, there is not currently a source of truth for the AI to reference. In other words, ChatGPT has no way to understand if the information that it is outputting is true or false.
When the AI is trained to be more cautious, it has an increased tendency to decline to answer questions that it could otherwise answer correctly.
Human-supervised training tends to mislead the model. This method of training puts too much emphasis on what the human trainer knows, rather than what the model itself knows.

Issues like these make it apparent that the road to solving or mitigating hallucinations is not clear and as a result it would be unwise for the chatbot end user to blindly trust its output. It is important for the end user to recognize that the purpose of LLMs is primarily to “pick the next best word based on statistical probability against their training set,” according to software developer Simon Willison.

Why Organizations Should be Concerned About Falsehoods Created by AI Chatbots

Hallucinations currently present a significant roadblock to any organization or individual hoping to monetize AI’s marketed ability to provide readily accessible information efficiently and reliably, at times even complete with citations and without the need for human intervention.

While the ability of AI to seemingly invent information is fascinating from a scientific perspective, it is problematic in a business environment. Users should be wary when dealing with this new and exciting technology so that they do not unintentionally expose themselves to the risks of unchecked hallucinations.

Even if you use the chatbots properly and safely, there are still many concerns users should be aware of. Although there exist specific protocols on how to best use generative AI, the risk of misinformation is still present. As chatbots are trained on a set of data, the chatbot could be prone to outputting improper responses if the dataset contains bias and discrimination. If this is the case, the bias of the chatbot will grow over time and increasingly influence the chatbot’s outputs.

Protecting Your Organization from AI Chatbot Hallucinations

To lessen the risk of unwanted or imprecise outputs, any user should also be careful to avoid certain parameters when querying these AI tools. Generally, anything that has to do with more complicated aspects of human understanding and reasoning should be avoided.

AI has the potential to be a powerful tool when used judiciously and safely; its existence nevertheless introduces new risks to any organization. For organizations that endeavor to implement AI technology, thoughtful consideration toward employing best practices in securing and protecting confidential information is paramount.

Applying strategies to verify datasets used by LLMs, including frequent validation tests to safeguard the accuracy and reliability of generated information, is crucial. Organizations must understand that human intuition and critical thinking are necessary to provide context and trustworthiness to generated information.

Overall, AI has the potential to substantially augment business practices, but the risks posed by hallucinations and other imperfections in the technology mean that organizations should tread carefully when implementing generative AI technologies and should not trust generated information without verifying it first.