Jump to content

Announcing support for multiple containers on Amazon SageMaker Inference endpoints, leading to cost savings of up to 80%


Recommended Posts

Amazon SageMaker now supports deploying multiple containers on real-time endpoints for low latency inferences and invoking them independently for each request. This new capability enables you to run up to five different machine learning (ML) models and frameworks on a single endpoint and save up to 80% in costs. This option is ideal when you have multiple ML models with similar resource needs and when individual models don't have sufficient traffic to utilize the full capacity of the endpoint instances. For example, if you have a set of ML models that are invoked infrequently or at different times, or if you have dev/test endpoints.

View the full article

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Create New...