In just three months, Google has already outperformed itself. From Gemini 1.0’s 32K token limit to Gemini 1.5’s 1M token cap, Google solidifies its position as the benchmark for long context window foundation models.
In one of his rare X posts, Sundar Pichai, the CEO of Google and Alphabet, introduced Gemini 1.5 to the public. Pichai shares Google’s significant 970K token leap from Gemini 1.0 to Gemini 1.5.
In December, we launched Gemini 1.0 Pro. Today, we're introducing Gemini 1.5 Pro! 🚀
— Sundar Pichai (@sundarpichai) February 15, 2024
This next-gen model uses a Mixture-of-Experts (MoE) approach for more efficient training & higher-quality responses. Gemini 1.5 Pro, our mid-sized model, will soon come standard with a… pic.twitter.com/m2BNufHd8C
While OpenAI may have seemingly upstaged the Gemini 1.5 announcement with Sora, we cannot overlook how impressive this new model is and the role it will play in the upcoming months.
Let’s dive into it!
- What is Long Context Windows and their Real-Life Applications
- So What? Can Gemini 1.5 Really Compete with ChatGPT?
- Now, What’s in it for u+
Long Context Windows and their Real-Life Applications
In case the gravity of this announcement grazed over you, Google articulately shares the importance of long context windows in their blog:
“Context windows are important because they help AI models recall information during a session… Remembering things in the flow of a conversation can be tricky for AI models, too—you might have had an experience where a chatbot ‘forgot’ information after a few turns. That’s where long context windows can help.”
To explain the real-life value of long context windows and this promising 1M token ceiling, Oriol Vinyals, one of the co-leads behind Gemini, shares this use case on X.
In the video, Vinyals uploaded the entire repository of the “three.js” library, which came out to 816,767 tokens, just 180K tokens shy of the model’s limit.
For clarity, the three.js is a JavaScript library used for creating 3D computer graphics on the web. It provides a wide range of tools and features for rendering 3D scenes, including support for various types of geometries, materials, textures, lighting, and animations.
Once the model remembers the entire content of the library, Oriol starts using it as the knowledge base of his conversation. This is a game-changer for developers who are referencing multiple models in improving, editing, and revising their projects.
Not only does the model retain and analyze information from library codes but also from books and documents. For instance, by having Gemini 1.5 ingest the complete Hunger Games trilogy containing 301,583 words, users can extract accurate and precise answers to questions like “When did Peeta hide in the forest?” or “Create a comprehensive timeline illustrating the events leading to the downfall of President Snow’s regime.”
I know, understanding the complexity of these tokens and long context windows can be confusing. To help wrap our heads around the magnitude of this announcement, Reddit user u/DragonForg explains it this way:
Another use case I found really interesting is this process Pichai shared on X.
In the video, Google uploaded a 44-minute silent film and prompted the model to recognize the key plot points and events depicted in the film. Impressively, the model was able to accurately pinpoint the time stamps of the events the prompt asked for. By uploading a silent film, Google emphasizes that Gemini 1.5 is capable of recalling and interpreting films meticulously—frame-by-frame.
Now, imagine sifting through hours of livestreams or podcasts searching for a quote that deeply resonated with you, only to realize you forgot to jot it down. As a devoted user of tools like Gemini 1.5, I envision leveraging them in practical ways similar to what the video showed us. These tools serve to solve the challenge of remembering, organizing notes, data gathering, and understanding complex information.
With models similar to this, we can focus on thinking and innovating–not just remembering.
Can Gemini 1.5 Really Compete with ChatGPT?
While Gemini 1.5 has exceeded public expectations and the standards set by Gemini 1.0, Reddit users appear to face challenges when utilizing the model to its fullest potential.
This user went so far as to suggest that dishonesty is the most effective approach to maximize the model’s output.
Meanwhile, this user humorously points out how AI models are increasingly mirroring human behavior.
On a positive note, a Redditor emphasized that these bugs and glitches seem to mirror the initial models of ChatGPT. According to the post, these errors are unavoidable and play a crucial role in the development phase.
In line with this, Reddit user u/MeaningfulThoughts highlights that because of these issues, ChatGPT’s position as the leading foundation model is secure for the time being. Perhaps ChatGPT 5 is on the horizon, poised to once again blow our minds.
While we are on the topic of ChatGPT, a Reddit user also noticed how OpenAI kind of trolled Google by announcing Sora, their very own text-to-video model, just hours after the Gemini 1.5 announcement.
This rally between the AI giants foreshadows more promising developments in 2024.
What’s in it for u+
Beyond the tokens, long context windows, and other technical jargon, we need to understand the potential that these AI models unlock for us.
I’m a firm believer that models similar to Gemini 1.5 will reshape how we consume and interact with media. Gone are the days of overwhelming our minds with unnecessary information or accidentally forgetting details we want to remember. In the long run, I imagine that AI will help us filter, declutter, reorganize, and remember information for a more enjoyable experience.
While some ponder the future, I believe we are already living in it.
Do you want to stay current with the latest AI news?
At a.i. + u, we deliver fresh, engaging, and digestible AI updates.
Stay tuned for more exciting developments!
Let’s see what stories we can bring to life next.
See you next addition!