AWS re:Invent Recap: Machine Learning Keynote

Here are the key announcements from the re:Invent 2020 Machine Learning Keynote:

  1. Faster Distributed Training on Amazon SageMaker is the quickest and most efficient approach for training large deep learning models and datasets. Through model parallelism and data parallelism, SageMaker distributed training automatically splits deep learning models and datasets for training in significantly less time across AWS GPU instances.
  2. Amazon SageMaker Clarify detects potential bias during all phases of the data preparation, model training, and model deployment, giving development teams greater visibility into their training data and models to resolve potential bias and explain predictions in greater detail.
  3. Deep Profiling for Amazon SageMaker Debugger gives developers the capability to train models at a quicker pace by monitoring system resource utilization automatically and providing notifications of training bottlenecks.
  4. Amazon SageMaker Edge Manager: provides developers the tools to optimize, secure, monitor, and maintain ML model management on edge devices like smart cameras, robots, personal computers, and mobile devices.
  5. Amazon Redshift ML empowers data analyst, development, and scientist teams to create, train, and deploy machine learning (ML) models using SQL commands. Teams can now build and train machine learning models from Amazon Redshift datasets and apply them to use cases.
  6. Amazon Neptune ML leverages Graph Neural Networks (GNNs) to make easy, fast, and more accurate predictions using graph data. The accuracy of most graph predictions increases to 50% with Neptune ML when compared to non-graph prediction methods. The selection and training of the best ML model for graph data are automated and lets users run ML on their graph directly using Neptune APIs and queries. ML teams can now create, train, and apply ML on Neptune data, reducing the development time from weeks down to a matter of hours.
  7. Amazon Lookout for Metrics applies ML to detect metrics anomalies in your metrics to perform proactive monitoring of the health of your business, issue diagnosis, and opportunity identification quickly that can save costs, increase margins, and improve customer experience.
  8. Amazon HealthLake leverages ML models to empower healthcare and life sciences organizations to aggregate various health information from different silos and formats into a centralized AWS data lake to standardize health data.

If you’re looking to explore these services further and need some guidance, let us know and we’ll connect you to an Idexcel expert!

AWS re:Invent Recap: SageMaker Data Wrangler

What happened?

The new service, SageMaker Data Wrangler, was announced during Andy Jessy’s 2020 re:Invent Keynote. Incorporated into AWS SageMaker, this tool simplifies the data preparation workflow so the entire process can be done from one central interface.

Why is it important?

  • SageMaker Data Wrangler contains over 300 built-in data transformations to normalize, transform, and combine features without having to write any code.
  • With SageMaker Data Wrangler’s visualization templates, transformations can be previewed and inspected in Amazon SageMaker Studio.
  • Data can be collected from multiple data sources and imported in one single go for data transformations.
  • Data can be in various file formats, such as CSV files, Parquet files, and database tables.
  • Data preparation workflow can be exported to a notebook or a code script for Amazon SageMaker pipeline or future use.

Why We’re Excited

SageMaker Data wrangler makes it easier for data scientists to prepare data for machine learning training using existing pre-loaded data preparation options. With preparation completed more quickly, our data science teams can accelerate the delivery of solutions to clients at a much faster pace.

If you’re looking to explore these services further and need some guidance, let us know and we’ll connect you to an Idexcel expert!