AI hardware startup Cerebras has created a new AI inference solution that could potentially rival Nvidia’s GPU offerings for enterprises.
The Cerebras Inference tool is based on the company’s Wafer-Scale Engine and promises to deliver staggering performance. According to sources, the tool has achieved speeds of 1,800 tokens per second for Llama 3.1 8B, and 450 tokens per second for Llama 3.1 70B. Cerebras claims that these speeds are not only faster than the usual hyperscale cloud products required to generate these systems by Nvidia’s GPUs, but they are also more cost-efficient.
This is a major shift tapping into the generative AI market, as Gartner analyst Arun Chandrasekaran put it. While this market’s focus had previously been on training, it is currently shifting to the cost and speed of inferencing. This shift is due to the growth of AI use cases within enterprise settings and provides a great opportunity for vendors like Cerebras of AI products and services to compete based on performance.
As Micah Hill-Smith, co-founder and CEO of Artificial Analysis, says, Cerebras really shined in their AI inference benchmarks. The company’s measurements reached over 1,800 output tokens per second on Llama 3.1 8B, and the output on Llama 3.1 70B was over 446 output tokens per second. In this way, they set new records in both benchmarks.
However, despite the potential performance advantages, Cerebras faces significant challenges in the enterprise market. Nvidia’s software and hardware stack dominates the industry and is widely adopted by enterprises. David Nicholson, an analyst at Futurum Group, points out that while Cerebras’ wafer-scale system can deliver high performance at a lower cost than Nvidia, the key question is whether enterprises are willing to adapt their engineering processes to work with Cerebras’ system.
The choice between Nvidia and alternatives such as Cerebras depends on several factors, including the scale of operations and available capital. Smaller firms are likely to choose Nvidia since it offers already-established solutions. At the same time, larger businesses with more capital may opt for the latter to increase efficiency and save on costs.
As the AI hardware market continues to evolve, Cerebras will also face competition from specialised cloud providers, hyperscalers like Microsoft, AWS, and Google, and dedicated inferencing providers such as Groq. The balance between performance, cost, and ease of implementation will likely shape enterprise decisions in adopting new inference technologies.
The emergence of high-speed AI inference, capable of exceeding 1,000 tokens per second, is equivalent to the development of broadband internet, which could open a new frontier for AI applications. Cerebras’ 16-bit accuracy and faster inference capabilities may enable the creation of future AI applications where entire AI agents must operate rapidly, repeatedly, and in real-time.
With the growth of the AI field, the market for AI inference hardware is also expanding. Accounting for around 40% of the total AI hardware market, this segment is becoming an increasingly lucrative target within the broader AI hardware industry. Given that more prominent companies occupy the majority of this segment, many newcomers should carefully consider important aspects of this competitive landscape, considering the competitive nature and significant resources required to navigate the enterprise space.
(Photo by Timothy Dykes)
See also: Sovereign AI gets boost from new NVIDIA microservices
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.
By AI News, August 29, 2024.