AWS Unveils Cost-Cutting Features for Bedrock LLM Hosting Service at re:invent Conference

Elliot Kim

Elliot Kim

December 04, 2024 · 3 min read
AWS Unveils Cost-Cutting Features for Bedrock LLM Hosting Service at re:invent Conference

AWS has introduced two new features for its Bedrock Large Language Model (LLM) hosting service, aimed at reducing costs and latency for businesses using generative AI in production. The announcements were made at the company's re:invent conference in Las Vegas, where AWS showcased its efforts to make LLMs more accessible and affordable for enterprises.

The first feature, caching, allows businesses to significantly reduce the cost of using large language models. According to Atul Deo, director of product for Bedrock, caching ensures that the model doesn't have to reprocess the same or similar queries multiple times, resulting in cost savings of up to 90%. This feature is particularly useful when dealing with long context windows, such as those found in Nova, which can have up to 300,000 tokens of context and 2 million tokens of context. Adobe, which tested prompt caching for some of its generative AI applications on Bedrock, saw a 72% reduction in response time.

The second feature, intelligent prompt routing, enables Bedrock to automatically route prompts to different models in the same model family, striking a balance between performance and cost. This system uses a small language model to predict how each model will perform for a given query and then routes the request accordingly. Deo explained that this feature allows businesses to avoid using the most capable and expensive models for simple queries, reducing costs and latency.

While LLM routing isn't a new concept, AWS's offering differentiates itself by requiring minimal human input. However, the system is currently limited to routing queries to models in the same model family. Deo mentioned that the team plans to expand this system and provide users with more customizability in the long run.

In addition to these features, AWS is also launching a new marketplace for Bedrock, which will offer around 100 emerging and specialized models. These models will require users to provision and manage their infrastructure capacity themselves, unlike Bedrock's typical automatic handling. This move is seen as a response to customer demand for support of specialized models, which may only have a few dedicated users.

The introduction of these features and the marketplace marks a significant step forward for AWS in making LLMs more accessible and affordable for businesses. As the use of generative AI becomes more widespread, the need for cost-effective and efficient solutions will only continue to grow. With these announcements, AWS is positioning itself as a leader in the LLM hosting space, providing enterprises with the tools they need to harness the power of AI.

The implications of these features extend beyond cost savings and latency reduction. They also have the potential to democratize access to LLMs, enabling more businesses to leverage the power of AI in their operations. As the technology continues to evolve, it will be interesting to see how AWS and other players in the space respond to the growing demand for efficient and cost-effective LLM solutions.

Similiar Posts

Copyright © 2024 Starfolk. All rights reserved.