Machine Learning Amazon Ralf Herbrich 3102021 1 Overview
Machine Learning @ Amazon Ralf Herbrich 3/10/2021 1
Overview • What is Machine Learning? • Machine Learning @ Amazon • • • Forecasting Content Linkage Scalable Machine Learning Services Visual Services Recommendation & Ranking • Conclusion 3/10/2021 2
Our Customers Over 237 Million Active Customer Accounts (as of 12/31/13) More than one million active AWS Customers 3/10/2021 More than 2 Million Active Seller Accounts (as of 10/30/13) Tools to produce and distribute unique ideas for Authors 3
Overview • What is Machine Learning? • Machine Learning @ Amazon • • • Forecasting Content Linkage Scalable Machine Learning Services Visual Services Recommendation & Ranking • Conclusion 3/10/2021 4
Machine Learning: The Science 3/10/2021 Science Artificial Intelligence Engineering • • • Rule extraction from data • Inspired by human learning • Adaptive algorithms • Training: Data Models • Prediction: Models Forecast • Decision: Forecast Actions Computer Science Statistics Neuroscience Operations Research 5
Machine Learning: Key Concepts Data Models Prediction Learning Parameters Utility Function 3/10/2021 Decision Making 6
Machine Learning: Formal Definition • Labelled Data • Unlabelled Data • Probability is a central concept in Machine Learning! 3/10/2021 7
Why Probability? 1. Mathematics of Uncertainty (Cox’ axioms) 3/10/2021 8
Cox Axioms: Probabilities and Beliefs • Design: System must assign degree of plausability statement A. • Axiom: • • • to each logical is a real number is independent of Boolean rewrite P must be a probability measure! 3/10/2021 9
Why Probability? 1. Mathematics of Uncertainty (Cox’ axioms) 2. Variables and Factors map to Memory & CPU 3/10/2021 10
Factor Graphs • Definition: Graphical representation of product structure of a function (Wiberg, 1996) • Nodes: = Factors = Variables • Edges: Dependencies of factors on variables. • Semantic: • Local variable dependency of factors 3/10/2021 a b c 11
Inference in a Factor Graph s 1 s 2 t 1 s 4 t 2 y 12 3/10/2021 s 3 t 3 y 23 12
Factor Graphs and Cloud Computing ϑ 1 ϑ 2 ϑϑ 3 ϑ 4 Belief Store (“Memory”) ϑ 5 Message Passing (“Communicate”) Data Messages (“Compute”) Y 1 3/10/2021 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 13
Why Probability? 1. Mathematics of Uncertainty (Cox’ axioms) 2. Variables and Factors map to Memory & CPU 3. Decouple Data Modeling and Decision Making 3/10/2021 14
Infer-Predict-Decide Cycle Decision Making: Loss(Action, Data) + P(Data) Action • Business-loss not learning-loss! • Often involves optimization! Inference: P(Parameters) + Data P(Parameters|Data) • Requires a (structural) model P(Data|Parameters) • Allows to incorporate prior information P(Parameters|Data) Prediction: P(Parameters) + Data P(Data) • Requires integration/summ ation of parameter uncertainty • Does not change state! 3/10/2021 15
Overview • What is Machine Learning? • Machine Learning @ Amazon • • • Forecasting Content Linkage Scalable Machine Learning Services Visual Services Recommendation & Ranking • Conclusion 3/10/2021 16
Machine Learning Opportunities @ Amazon Retail Customers Seller Catalog Digital • Demand Forecasting • Vendor Lead Time Prediction • Pricing • Packaging • Substitute Prediction • Product Recommendation • Product Search • Visual Search • Product Ads • Shopping Advice • Customer Problem Detection • Fraud Detection • Predictive Help • Seller Search & Crawling • Browse-Node Classification • Meta-data validation • Review Analysis • Named-Entity Extraction • XRay • Plagiarism Detection 3/10/2021 17
Machine Learning ML @ Amazon Forecasting Retail 3/10/2021 Content Linkage Digital Scalable Algorithms & Services AWS Visual Services Retail & Digital Recommendation Retail 18
Machine Learning ML @ Amazon Forecasting Retail 3/10/2021 Content Linkage Digital Scalable Algorithms & Services AWS Visual Services Retail & Digital Recommendation Retail 19
Forecasting Setting • Given past sales of a product in every region, predict regional demand up to one year into the future Challenges • • 3/10/2021 New Products: No past demand! Regionalized: 82 fulfillment centers worldwide Sparsity: Huge skew – many products sell very few items Seasonal: Huge variation due to external, seasonal events Distributions: Future is uncertain predictions must be distributions Scale: 20 M+ products fulfilled by Amazon alone! Orders: Customers demand bundle of products Censored: Past sales ≠ past demand (inventory constraint) 20
Demand Forecasting 3/10/2021 21
Machine Learning ML @ Amazon Forecasting Retail 3/10/2021 Content Linkage Digital Scalable Algorithms & Services AWS Visual Services Retail & Digital Recommendation Retail 22
Content Linkage Setting • Enrich Every Piece of Digital Content Continuously by Linking it to Relevant Content on Amazon and the Web Challenges • • • 3/10/2021 Scale: Millions of books – with 1000’s added each day! Languages: Over 20 different languages (Machine Translation!) Media: Link books, movies, products and maps together Web: Web grows by 1 B+ pages per day Representation: Language and media-independent (Wiki? ) 23
XRay 3/10/2021 24
Machine Learning ML @ Amazon Forecasting Retail 3/10/2021 Content Linkage Digital Scalable Algorithms & Services AWS Visual Services Retail & Digital Recommendation Retail 25
Scalable Algorithms & Services Setting • No limitations on model size and data size! Challenges • • • 3/10/2021 Distributed: Parameters need to be distributed Fault Tolerance: Data and model chunks might fail Simplicity: Zero-parameter algorithms for engineers Any-Time: Any-time convergence of algorithms Resource-Constrains: Learning algorithms that optimize under resource & budget constraints 26
Machine Learning ML @ Amazon Forecasting Retail 3/10/2021 Content Linkage Digital Scalable Algorithms & Services AWS Visual Services Retail & Digital Recommendation Retail 27
Visual Bin Inspection Event: Travel. John has been picked Inbound Camera View 3/10/2021 Outbound Camera View 28
Machine Learning ML @ Amazon Forecasting Retail 3/10/2021 Content Linkage Digital Scalable Algorithms & Services AWS Visual Services Retail & Digital Recommendation Retail 29
Product Recommendations Setting • Discover products by recommending the right product, to the right customer, in the right place, at the right time Challenges • Scale: Hundreds of millions of users and products • Personalization: Product preferences may vary greatly across users, multiple user personas • Contextual: Strong dependency on appearance, location and context • Real-Time: Recommendations require low-latency (<40 ms) • Data Sparsity: Vast majority of users make very few purchases • Cold Start: Recommendations for new users/products • Value: Long-term value of recommendation (e. g. , repeat/co-orders) 3/10/2021 30
Ranking Metrics • Definition: Quality of a ranking r if the user labeled the list l • Examples:
Elastic-Ranking Model • Per ranked list r and labels l: • M is the chosen ranking metric over r • L is the pairwise loss of ranking item i below item j in the list (hinge, logistic, etc. ) • Special case: Pairwise loss for constant M.
Regularization at Scale • Data volume: Focus on one-pass algorithm • Training Speed: Focus on SGD variants (only one instance in memory at a time) • Prediction Speed: Focus on L 1 regularization to get sparse models (elastic-net):
FOBOS (Duchi & Singer 2009) • Main Idea: • Step 1: Unconstrained gradient step • Step 2: Constrained optimization from Step 1 • Implementation
Pruned Stochastic Gradient Descent • Idea: • Set λ 1=0 and λ 2=0 • Compute stochastic gradient and apply • Prune small weights to zero after N iterations
Large-Scale Widget Data (1 Pass) Relative Improvement of Recall@1 117% 112% 107% PSGD 102% FOBOS 97% 92% 1 10 100 Sparsity (# of non-zero parameters) 10000
Large-Scale Widget Data (2 Pass) Relative Improvement of Recall@1 117% 112% 107% PSGD 102% FOBOS 97% 92% 1 10 100 Sparsity (# of non-zero parameters) 10000
Large-Scale Widget Data (1 Pass) Relative Improvement of Recall@4 105% 104% 103% 102% 101% 100% PSGD 99% FOBOS 98% 97% 96% 1 10 100 Sparsity (# of non-zero parameters) 10000
Large-Scale Widget Data (2 Pass) Relative Improvement of Recall@4 105% 104% 103% 102% 101% PSGD 100% FOBOS 99% 98% 97% 1 10 100 Sparsity (# of non-zero parameters) 10000
LETOR – TD 2003
LETOR – OHSUMED
Overview • What is Machine Learning? • Machine Learning @ Amazon • • • Forecasting Content Linkage Scalable Machine Learning Services Visual Services Recommendation & Ranking • Conclusion 3/10/2021 42
Conclusions • Machine Learning is central to all products and customer experiences at Amazon! • Amazon has a large variety of human-labeled datasets – ranging from search, ads, forecasting, e. Books, videos to device data! • Probability is a powerful tool – both for modeling and mapping to distributed systems! 3/10/2021 43
- Slides: 43