Introducing distributed training on Amazon SageMaker

Amazon Web Services · December 8, 2020

Today we are introducing Amazon SageMaker distributed training, the fastest and easiest methods for training large deep learning models and datasets. Using partitioning algorithms, SageMaker distributed training automatically splits large deep learning models and training datasets across AWS GPU instances in a fraction of the time it takes to do manually. SageMaker achieves these efficiencies through two techniques: model parallelism and data parallelism. Model parallelism splits models too large to fit on a single GPU into smaller parts before distributing across multiple GPUs to train, and data parallelism splits large datasets to train concurrently in order to improve training speed.

View the full article

Sign In

Introducing distributed training on Amazon SageMaker

Recommended Posts

Amazon Web Services

Link to comment

Share on other sites

Join the conversation