General Guidance Hungyi Lee Framework of ML Training
- Slides: 29
General Guidance Hung-yi Lee 李宏毅
Framework of ML Training data: Testing data: Speech Recognition Image Recognition Speaker Recognition Machine Translation 痛みを知れ (speaker) 了解痛苦吧
Framework of ML Training data: Training: Step 1: function with unknown Step 2: define loss from training data Step 3: optimization Testing data: Upload to Kaggle
General Guide model bias make your model complex loss on training data large optimization Next Lecture overfitting trade-off small loss on testing data small large mismatch more training data (not in HWs) data augmentation make your model simpler Not in HWs, except HW 11 Split your training data into training set and validation set for model selection
Model Bias • The model is too simple. find a needle in a haystack … … but there is no needle too small … • Solution: redesign your model to make it more flexible More features Deep Learning (more neurons, layers) small loss
General Guide model bias make your model complex loss on training data large optimization Next Lecture overfitting trade-off small loss on testing data small large mismatch more training data (not in HWs) data augmentation make your model simpler Not in HWs, except HW 11 Split your training data into training set and validation set for model selection
Optimization Issue • Large loss not always imply model bias. There is another possibility … large A needle is in a haystack … … Just cannot find it.
Model Bias find a needle in a haystack … … but there is no needle Which one? ? ? Optimization Issue A needle is in a haystack … … Just cannot find it. too small … small loss
Ref: http: //arxiv. org/abs/1512. 03385 Model Bias v. s. Optimization Issue • Gaining the insights from comparison Optimization issue Overfitting? Testing Data Training Data
Ref: http: //arxiv. org/abs/1512. 03385 Optimization Issue • Gaining the insights from comparison • Start from shallower networks (or other models), which are easier to optimize. • If deeper networks do not obtain smaller loss on training data, then there is optimization issue. 2017 – 2020 1 layer 0. 28 k 2 layer 0. 18 k 3 layer 0. 14 k 4 layer 0. 10 k 5 layer 0. 34 k • Solution: More powerful optimization technology (next lecture)
General Guide model bias make your model complex loss on training data large optimization Next Lecture overfitting trade-off small loss on testing data small large mismatch more training data (not in HWs) data augmentation make your model simpler Not in HWs, except HW 11 Split your training data into training set and validation set for model selection
General Guide model bias make your model complex loss on training data large optimization Next Lecture overfitting trade-off small loss on testing data small large mismatch more training data (not in HWs) data augmentation make your model simpler Not in HWs, except HW 11 Split your training data into training set and validation set for model selection
Overfitting • Small loss on training data, large loss on testing data. Why? An extreme example Training data: Less than useless … This function obtains zero training loss, but large testing loss.
Overfitting “freestyle” Flexible model Large loss Real data distribution (not observable) Training data Testing data
Overfitting Flexible model More training data (cannot do it in HWs) Data augmentation (you can do that in HWs)
Overfitting constrained model Real data distribution (not observable) Training data Testing data
Overfitting constrained model Real data distribution (not observable) Training data Testing data
Overfitting constrained model • • • Less parameters, sharing parameters Less features Early stopping Regularization Dropout Fully-connected CNN
Overfitting constrain too much Back to model bias … Real data distribution (not observable) Training data Testing data
Bias-Complexity Trade-off loss Testing loss select this one Training loss Model becomes complex (e. g. more features, more parameters)
Homework Training Set public private Testing Set Model 1 mse = 0. 9 Model 2 mse = 0. 7 Model 3 mse = 0. 5 Pick this one! mse > 0. 5 May be poor … The extreme example again Random on private testing set
Homework Training Set public private Testing Set Model 1 mse = 0. 9 Model 2 mse = 0. 7 Model 3 mse = 0. 5 Pick this one! What will happen? http: //www. chioka. in/howto-select-your-final-modelsin-a-kaggle-competitio/ This explains why machine usually beats human on benchmark corpora. Why? mse > 0. 5 May be poor …
Cross Validation How to split? public private Training Set Testing Set Training Set Validation set Model 1 mse = 0. 9 Using the results of public testing data to select your model You are making public set better than private set. Model 2 mse = 0. 7 Not recommend Model 3 mse = 0. 5 mse > 0. 5
N-fold Cross Validation Training Set Model 1 Model 2 Model 3 Train Val mse = 0. 2 mse = 0. 4 Train Val Train mse = 0. 4 mse = 0. 5 Val Train mse = 0. 3 mse = 0. 6 mse = 0. 3 Avg mse = 0. 5 = 0. 4 Testing Set public private
General Guide model bias make your model complex loss on training data large optimization Next Lecture overfitting trade-off small loss on testing data small large mismatch more training data (not in HWs) data augmentation make your model simpler Not in HWs, except HW 11 Split your training data into training set and validation set for model selection
Let’s predict no. of views of 2/26! 2017 – 2020 2021 1 layer 2 layer 0. 28 k 0. 18 k Red: real, Blue: predicted 0. 43 k 0. 39 k 3 layer 0. 14 k 2/26 0. 38 k 4 layer 0. 10 k 0. 44 k e = 2. 58 k
General Guide model bias make your model complex loss on training data large optimization Next Lecture overfitting trade-off small loss on testing data small large mismatch more training data (not in HWs) data augmentation make your model simpler Not in HWs, except HW 11 Split your training data into training set and validation set for model selection
Mismatch • Your training and testing data have different distributions. Be aware of how data is generated. Most HWs do not have this problem, except HW 11 Training Data Simply increasing the training data will not help. Testing Data
General Guide model bias make your model complex loss on training data large optimization Next Lecture overfitting trade-off small loss on testing data small large mismatch more training data (not in HWs) data augmentation make your model simpler Not in HWs, except HW 11 Split your training data into training set and validation set for model selection
- Hungyi lee
- Hung yi lee
- Hungyi lee
- Hungyi lee
- Hungyi lee
- Where did general lee surrender to general grant?
- Balanced occlusion vs balanced articulation
- Direct guidance strategies examples
- Career guidance training
- Dispositional framework vs regulatory framework
- How to explain conceptual framework
- Framework sample
- Theoretical framework
- Dispositional framework vs regulatory framework
- Conceptual framework vs theoretical framework
- Diferencia entre gran plano general y plano general
- Business analysis framework template
- Training evaluation framework
- General ledger accounting training
- Downstream entity examples
- General principles of training
- Training is expensive without training it is more expensive
- Perbedaan on the job training dan off the job training
- Aggression replacement training facilitator training
- Xxx school
- Distinction between guidance and counselling
- Trrp pcls
- Usgs visual identity system guidance
- Army strategic planning guidance
- Strongsville high school guidance