Amazon Web Services Posted March 16, 2021 Share Posted March 16, 2021 Amazon SageMaker now supports deploying multiple containers on real-time endpoints for low latency inferences and invoking them independently for each request. This new capability enables you to run up to five different machine learning (ML) models and frameworks on a single endpoint and save up to 80% in costs. This option is ideal when you have multiple ML models with similar resource needs and when individual models don't have sufficient traffic to utilize the full capacity of the endpoint instances. For example, if you have a set of ML models that are invoked infrequently or at different times, or if you have dev/test endpoints. View the full article Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.