The Achilles Heel of LLMs

An AI tool that makes mistakes and outputs bigrams and trigrams.

If I use your AI tool for months and exchange hundreds of thousands of words with it, there’s a good bet that I know what I’m talking about when I criticize its weaknesses. I speak of the flaws that occur when generative AI chatbots such as ChatGPT, Gemini, and Meta’s AI repeat words, bigrams, and trigrams in their output.

This isn’t going to be a long post. The results of some testing I conducted reveal everything you need to know in short order.

I began with Meta AI, asking it to complete this task.

After the chatbot answered my question, I prompted it to create this.

I’ve uploaded Meta’s AI-created text to this Google document. Whenever the chatbot stopped writing, I prompted it to continue. I stopped asking it to produce more text at 1,045 words because I could already observe how problematic the output was.

The results were atrocious for such a small amount of AI-generated text. This screenshot from the Word Counter website I used to calculate them shows what occurred.

I next prompted and tested Gemini with similar parameters.

The 3,280 words that Gemini quickly created are stored in this Google Document.

This is a screenshot of the test results for Gemini.

As you can observe, even though Gemini created three times as much text, problematic overuse of words, bigrams, and trigrams again appeared throughout the text.

My final test was conducted on ChatGPT.

Output from my testing on ChatGPT can be accessed in this Google Document.

Of the three AI tools, ChatGPT, in my opinion, performed the best. It produced 2,725 words before I stopped prompting it to continue. You can view the results in this screenshot.

Each model tends to bring its unique problems to the table. Observe the overuse of “eyes” in the Gemini and ChatGPT output.

If you’re like me, you’re probably wondering why AI companies don’t fix these readily apparent flaws. As for what’s causing them, I have no clue. Critics of the technology like Gary Marcus would probably refer to these aberrations as hallucinations. I’m not an AI Engineer, so I can’t shed any light on the origin of the problem. But we as end-users have a vested interest in seeing this troubling issue get addressed and fixed as quick as possible.

The biggest headache that I had when creating my novel, AI Machinations: Tangled Webs and Typed Words, was fixing this word salad mess that endlessly appeared in the generated text. And it goes further than that with much more dire consequences when students use the AI tools and then get in trouble for it because an AI detector easily picked up on what I just showed you.

I’ve been sharing this information with other AI power users on X, and it was shocking to learn how many of them had no idea this was happening. Well, now you know.

My hope is by exposing these flaws in the Large Language Models (LLM), the companies that produced them will step up and fix these problems. Until they do, be forewarned about what you’re exposing yourself to when using them.

If you don’t know, you can use grammar tools such as ProWritingAid to clean this text up, but it’s a lot of work. It’s time that can be better spent elsewhere.


Posted

in

by