AWS re:Invent Recap: Amazon SageMaker Debugger

Recap Amazon SageMaker Debugger

What happened?

Amazon SageMaker Debugger, a tool that monitors machine learning training performance to help developers train models faster, was announced during the re:Invent 2020 Machine Learning keynote. This tracks the system resource utilization and creates alerts for problems during training. With these new capabilities, automatic recommendations for resource allocation for training jobs, resulting in an optimized training process that reduces time and costs.

Why is it important?

  • Monitor Automatically: Amazon SageMaker Debugger enables developers to train their models faster through automatic monitoring of system resource utilization and alerts for training bottlenecks or bugs.
  • ID & Resolve Issues Faster: Amazon SageMaker Debugger provides quick issue resolution and bug fix actions with automatic alerts and resource allocation recommendations.
  • Customizability: With SageMaker Debugger, custom conditions can also be created to test for specific behavior in your training jobs.

Why are we excited?

AWS SageMaker Debugger allows data scientists to iterate over a ML model to give better accuracy and assist in detecting model training inconsistencies in real-time. With a little brainstorming and experience, we can find out the actual problem in our ML model. It also integrates with AWS Lambda, which can automatically stop a training job when a non-converging action is detected, resulting in lower costs and faster training time.

Availability

Amazon SageMaker Debugger is now generally available in all AWS regions in the Americas and Europe, and some regions in Asia Pacific with additional regions coming soon.

If you’re looking to explore these services further and need some guidance, let us know and we’ll connect you to an Idexcel expert!