The Fastest AI Chip is Here and it Didn’t Cost $7 TRILLION, Sam Altman

Groq’s lightning-speed computer chip enables open-source models to respond in 0.22 seconds, that’s 0.17 seconds faster than other inference providers. Will this answer AI’s need for speed?

Speed is very important when it comes to generative AI.

When conversing with an AI chatbot, you want it to respond stat. If you’re asking it to compose an email you want the results in seconds so that you can move on to your next task. Because, isn’t that the point of these AI tools, to make our work more efficient?

But, what happens when these responses take minutes?
Do we lose trust in the AI model?

Groq, a computer chip company, is dedicated to addressing these latency challenges. Their chip, known as Groq, introduces a new approach to data processing through its Language Processing Unit (LPU). This specialized hardware component is crafted to efficiently handle and manipulate human language data. With this new technology, the startup aims to deliver accelerated AI responses, surpassing human typing speeds by 75 times.

Not to be confused with Elon Musk’s Grok, Groq specializes in developing high-performance processors and software solutions for AI, machine learning (ML), and high-performance computing applications.

I know, I too find phonetic similarity funny, even Groq pokes fun at this too.

While the startup doesn’t train its own AI language models, it can exponentiate the speed of models like Meta’s Llama2-70B and Mixtral’s Mixtral-8x7B. This means that Groq serves as a vessel for users to speed up open-source models.

Alongside these existing models, Tom Ellis from Groq mentioned the ongoing development of custom models. Currently, they are focusing on expanding their open-source model options. This approach allows startups with limited resources to access equivalent processing capabilities as larger companies.

Moreover, as shown in this X post from Dina Yarlen, we can see how fast Mixtral-8x7B responds to a prompt when used via Groq. Then, Yarlen used the same prompt on ChatGPT-3 and here, we can clearly see the difference in speed. If you’re pressed for time and seeking rapid responses, Groq may be your top pick.

As our modes of communication evolve, so do the mediums we use. In the age of AI, could LPUs become our unsung heroes?

Let’s talk about Groq!

What is in the name? Groq vs. Grok

The main difference between Grok and Groq lies in their respective domains.

Grok, launched on Nov. 4, 2023, is a chatbot associated with xAI; a company focusing on various AI-related products like chatbots, image recognition, and machine learning. On the other hand, Groq, founded in 2016, is an AI chip company that specializes in creating chips to enhance the speed of generative AI models.

The name Grok originates from the science fiction novel “Stranger in a Strange Land” written by Robert A. Heinlein in 1961. In the context of the book, “grok” is a Martian word that translates roughly to “to drink” but carries the deeper meaning of “to understand intimately and completely in a worldview-transforming way.”

This term implies that once someone groks something, they have absorbed it so thoroughly that this very something becomes a part of the person–merging or blending with it.

The choice of the name Grok reflects the desire to create an AI that understands humanity deeply and intuitively, mirroring the concept introduced by Heinlein’s work.

On the other hand, the meaning of the name “Groq” is associated with qualities like talent, caretaking, and being more attractive. “Groq” was chosen with the vision of making AI accessible for all. The name was officially trademarked in 2016, the same year the chip company was founded. The startup was founded by Jonathan Ross.

Before establishing Groq, Ross began what became Google’s Tensor Processing Unit (TPU) where he designed and implemented the core elements of the first-generation TPU chip.

Despite the similarity in sound, the difference in spelling between “Groq” and “Grok” is unlikely to cause confusion from a trademark perspective.

However, I do still find it humorous that almost everyone in the AI space clarifies the distinction between the two when talking about the chip and the chatbot at the same time.

Beyond technicalities and trademarks, Groq had already raised this concern last Nov. 29, 2023, in a letter to Elon Musk. The startup playfully proposed that Elon consider switching from Grok to Slartibartfast.

With Groq becoming more accessible to users, it’s interesting to see the direction of this conversation. As of the moment, there is no direct statement from either Grok or xAI about the comparisons between the chatbot and the chip company.

CPUs, GPUs, TPUs, Now LPUs: What’s with all these Us?

If Language Processing Unit or LPU is a new term to you, here’s a quick briefer.

  • The LPU is a special kind of computer chip designed to handle language tasks very quickly. Unlike other chips that do many things at once or parallel processing, the LPU works on tasks one after the other or sequential processing, which is perfect for understanding and generating language.
  • Groq took a new approach right from the start, focusing on software and compiler development before even thinking about the hardware. They made sure the software could guide how the chips talk to each other, ensuring they work together seamlessly like a team in a factory. With this, the LPU is designed to overcome the two LLM bottlenecks: compute density and memory bandwidth. This means that through Groq, we can use models at great volume. To make this possible, Groq increased the number of cores from 48 to 192 and added more memory.

Based on a paper Groq published last 2020 titled “Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads,” they referred to this new technology as a Tensor Streaming Processor (TSP).

The term “LPU” is a recent addition to Groq’s lore since it’s never been mentioned in the paper. Even YouTuber 1littlecoder speculates that Groq may have rebranded the chip to align its identity with the rising popularity of Large Language Models (LLMs).

What do you think?

Now, let’s delve into what differentiates LPU from CPU, GPU, and TPU in detail.

These differences show how Groq adds value to open-source AI models. How this new technology can change the way we develop and use AI models.

This YouTube video shows how fast Groq replies to CNN reporter Becky Anderson. It’s like we’re getting closer to the day when talking to AI models will be as easy as making a phone call.

To make our understanding more concrete, let’s take a look at the LLMPerf (Large Language Performance) Leaderboard. The following graph presents a comparison of the average output tokens per second from 150 requests employing the Llama2-70B model. The data reveals that Gorq leads its competitors by 119 tokens.

This indicates that Gorq outperforms its competitors by processing the highest number of tokens per second. So, if you like using Llama2-70B, maybe Groq can be your new inference provider.

Another important factor of large language inference is called “time to first token” or TTFT, which corresponds to the duration of time that the model returns the first token or the response to your prompt. Below is a chart showing how different LLM inference providers perform:

In this graph, Groq responded to the user in just 0.22 seconds. The chip trails Anyscale, a unified computing platform, by only 0.01 seconds. Still, it outpaces competitors by 0.15 seconds. This implies that if you desire quick responses and aim to address latency issues effectively, leveraging Groq might be the optimal choice.

While the LPU holds much promise, there are concerns about its capabilities.

A former Groq engineer took to Hacker News to address the inaccuracies in its branding and the potential for misleading a wider user base.

On a positive note, many are embracing the possibility of NVIDIA no longer dominating the GPU market, which could drive significant technological advancements. This shift would require NVIDIA, a well-established brand, to enhance and innovate its solutions to remain competitive as other players rise with comparable features.

Furthermore, a more diverse industry would promote price competition, freeing users from being tied to a single legacy brand.

Regardless of our skepticism regarding the accuracy and potential of this technology, we cannot overlook the speed at which it interacts with us and the vast use cases it offers.

What’s in it for u+

What’s the value of having an AI chatbot respond to me in 0.22 seconds anyway?

  • High-Performance Language Processing + AI Acceleration

    Groq’s LPU system has set new performance records for speed and accuracy in processing LLMs, achieving over 300 tokens per second per user on Meta AI’s Llama2-70B model.

    Moreover, LPUs like Groq’s are purpose-built for sequential and compute-intensive tasks, making them ideal for accelerating AI applications, particularly in natural language processing tasks.

  • Improved Latency and Scale + Efficient Inference Engine

    LPUs address latency and scale-related issues faced by traditional GPU-based systems when processing LLMs, offering ultra-low latency and enhanced performance.

    With this, Groq’s LPU system enables better performance and quality for both open-source and customer-proprietary models, enhancing the potential return on investment (ROI) for integrating LLMs into various tools and services.

  • Text Analytics and Sentiment Analysis + Grammar-Checking

    LPUs facilitate text analytics processes by extracting meaningful insights from large volumes of text data, enabling sentiment analysis for understanding customer emotions and trends to make informed business decisions.

    On top of this, LPUs can be applied in various sectors such as banking, insurance, health, advertising, public relations, publishing, and more to enhance customer-facing interactions through NLP technologies like voice assistants, chatbots, grammar-checking tools, sentiment analysis, text analytics, and machine translation.

    By leveraging this automation, we can dedicate our energy to generating more projects while having AI models to support us in evaluating project quality. No need to get stuck with minor errors, these AI tools will help us.

  • Text-to-Speech Conversion + Chatbots and Virtual Assistants

    LPUs enable speech-to-text systems that convert spoken language into written text efficiently, enhancing communication channels and customer service interactions.

    Plus, LPUs power chatbots and virtual assistants with Natural Language Generation (NLG) capabilities to engage with customers effectively, answer queries, provide support, and drive lead generation in businesses.

    There will be a time when the automated calls we receive won’t originate from humans, but rather from AI models fueled by inference providers such as Groq.

At its core, Groq’s LPU offers a significant advancement in AI acceleration and natural language processing capabilities, with diverse applications across industries to enhance user experiences, streamline operations, and drive innovation in AI technologies.

Waste no time and test Gorq’s lighting-speed responses HERE.

Now, it’s time to think at the speed of light.

Do you want to stay current with the latest AI news?
At a.i. + u, we deliver fresh, engaging, and digestible AI updates.

Stay tuned for more exciting developments!
Let’s see what stories we can bring to life next.

See you next addition!