Context windows are one of the most underappreciated parts of artificial intelligence systems, particularly when related to chat interfaces.
These ‘windows’ temporarily store chat texts during our interaction with AI models.
By storing chat messages in memory, the model can maintain a consistent understanding of the overall conversation.
So, for example, an initial user request might be to find information on the population of Berlin, at which point the model will return the response.
However any follow up questions, such as the city’s best cafes, need to be connected to the previous chat request, otherwise the whole conversation grinds to a halt.
By keeping track of the conversation’s history, a context window lets the AI coherently manage lengthy conversations so they flow well, and stay aligned with the overall discussion.
Context windows are measured in the total amount of tokens the model can process at any one time.
Typical windows range from 8192 tokens up to 2 million and above, in the case of Google’s Gemini AI models. It’s hard to be specific, but in general a token usually represents four characters of text in the English language.
So for instance 100 tokens would be 75 words. A common context window for today’s mainstream cloud based models is around 128,000 tokens or just over 1200 words.
The ability to keep track of long conversations is crucial to making chatbots or virtual assistants valuable in real use cases.
That’s because the size of the context window significantly influences how well the AI can can handle more complex interactions.
It’s the difference between having a normal conversation, and struggling to be understood by someone who keeps forgetting what you’re discussing. Clearly the latter would be extremely frustrating.
A larger context window also lets the model access and remember a much wider range of information during the chat, which can also aid in the model’s ability to return intelligent responses.
This ability to maintain context not only in time, but also breadth of data, is an increasingly important part of the utility of current models.
This is especially true with what are known as ‘thinking’ models, which take more time to evaluate all the options before giving a response.
Thinking has essentially replaced the old prompt practice of asking the model to specifically think ‘step by step’, but the end result is still the same.
Any aspect of AI which employs enhanced reflection or extended dialogue, inevitably requires a longer context window to cope with the additional processing demands.
Advanced models typically employ a rolling context window which adds new chat messages to the memory while dropping older messages out of the window at the other end.
This ensures that the AI can always refer to earlier parts of a conversation when dealing with new user requests.
Where a context window is too small, or the user request is so large or complex that it overflows the window, the model may return a response which is either nonsensical or hallucinates wildly.
The size of a context window is also important for web search and recommendation requests.
The general rule is the more complex the chat request, the larger the context window you need in the chosen model.
The downside of a larger context window is increased computer processing requirements, so it’s usual to only find large context windows in cloud based AI models with their huge compute resources.
Small local desktop or open source models are forced to employ smaller context windows because of the low power computers they run on.
Despite these limitations, the continuing optimization and improved capabilities of local AI models means that they will inevitably become more useful for everyday tasks over time.
https://cdn.mos.cms.futurecdn.net/YcsRuMqREGCVBFgSy6MAgH-1200-80.jpg
Source link