An often overlooked limitation for chatbots is memory. While it’s true that the AI language models that power these systems are trained on terabytes of text, the amount these systems can process when in use — that is, the combination of input text and output, also known as their “context window” — is limited. For ChatGPT it’s around 3,000 words. There are ways to work around this, but it’s still not a huge amount of information to play with.
Now, AI startup Anthropic (founded by former OpenAI engineers) has hugely expanded the context window of its own chatbot Claude, pushing it to around 75,000 words. As the company points out in a blog post, that’s enough to process the entirety of The Great Gatsby in one go. In fact, the company tested the system by doing just this — editing a single sentence in the novel and asking Claude to spot the change. It did so in 22 seconds.
You may have noticed my imprecision in describing the length of these context windows. That’s because AI language models measure information not by number of characters or words, but in tokens; a semantic unit that doesn’t map precisely onto these familiar quantities. It makes sense when you think about it. After all, words can be long or short, and their length does not necessarily correspond to their complexity of meaning. (The longest definitions in the dictionary are often for the shortest words.) The use of “tokens” reflects this truth, and so, to be more precise: Claude’s context window can now process 100,000 tokens, up from 9,000 before. By comparison, OpenAI’s GPT-4 processes around 8,000 tokens (that’s not the standard model available in ChatGPT — you have to pay for access) while a limited-release full-fat model of GPT-4 can handle up to 32,000 tokens.
Right now, Claude’s new capacity is only available to Anthropic’s business partners, who are tapping into the chatbot via the company’s API. The pricing is also unknown, but is certain to be a significant bump. Processing more text means spending more on compute.
But the news shows AI language models’ capacity to process information is increasing, and this will certainly make these systems more useful. As Anthropic notes, it takes a human around five hours to read 75,000 words of text, but with Claude’s expanded context window, it can potentially take on the task of reading, summarizing and analyzing a long documents in a matter of minutes. (Though it doesn’t do anything about chatbots’ persistent tendency to make information up.) A bigger context window also means the system is able to hold longer conversations. One factor in chatbots going off the rails is that when their context window fills up they forget what’s been said and it’s why Bing’s chatbot is limited to 20 turns of conversation. More context equals more conversation.