Announcing support for multiple containers on Amazon SageMaker Inference endpoints, leading to cost savings of up to 80%

By Amazon Web Services
March 16, 2021 in Amazon Web Services

Followers 0

Recommended Posts

Amazon Web Services

Posted March 16, 2021

- Share

Posted March 16, 2021

Amazon SageMaker now supports deploying multiple containers on real-time endpoints for low latency inferences and invoking them independently for each request. This new capability enables you to run up to five different machine learning (ML) models and frameworks on a single endpoint and save up to 80% in costs. This option is ideal when you have multiple ML models with similar resource needs and when individual models don't have sufficient traffic to utilize the full capacity of the endpoint instances. For example, if you have a set of ML models that are invoked infrequently or at different times, or if you have dev/test endpoints.

View the full article