SOP for Machine Learning J S Roger Jang

SOP for ML Does SOP for ML exist? � Yes! � More or less…

Task Definition Tasks using supervised learning � Classification? Regression? Ranking � Input? Output? �

Data Cleaning Handling of missing value � Data selection by maximum coverage � Data

Data Preprocessing Unstructured data to structured � Texts One-hot encoding Word hashing Word to

Data Partitioning Data partitioning for k-fold cross validation � Training data � Validation data

Criteria for Data Partitioning Criteria � Stratified k-fold cross validation � Reasonable data in

Classical ML Models Try out basic models � Static models K-nearest-neighbor classifier Naïve Bayes

Performance Metrics Possible performance metrics (aka figure of merit) � Classification Recognition rate Precision,

Data Size Reduction Dimensionality reduction � Feature selection Sequential forward selection Exhaustive search Some

Avoidance of Overfitting Model structure identification (aka model complexity estimation) � No. of clusters

Summary: SOP for ML Data processing � Dataset collection � Classifier Missing value imputation

One Step Further Auto. ML � Machine learning for automated algorithm design Keys �

Slides: 13

Download presentation

SOP for Machine Learning J. -S. Roger Jang (張智星) jang@mirlab. org http: //mirlab. org/jang MIR Lab, CSIE Dept. National Taiwan University

SOP for ML Does SOP for ML exist? � Yes! � More or less… 2/13

Task Definition Tasks using supervised learning � Classification? Regression? Ranking � Input? Output? � Datasets available? How to collect, label, and validate them? How to pipeline the process? � Objective functions Recognition rate? RMSE? Overall cost? � Model screening Fixed input dimension Static models Variable input dimension Dynamic models 3/13

Data Cleaning Handling of missing value � Data selection by maximum coverage � Data imputing By median or average By ML (again!) Data checkup (requires the aid of visualization tool) � Same-value features within a class � Same-feature entry � Outline removal via whisker plot � Correlation analysis/plot 4/13

Data Preprocessing Unstructured data to structured � Texts One-hot encoding Word hashing Word to vector TF-IDF & SVD … 5/13

Data Partitioning Data partitioning for k-fold cross validation � Training data � Validation data � Test data Whole dataset Training set For model construction Validation set Test set For model Evaluation & selection For final model evaluation Disjoint! 6/13

Criteria for Data Partitioning Criteria � Stratified k-fold cross validation � Reasonable data in a fold for k-fold cross validation For example, leave-one-person-out CV for written digit recognition � Strictly disjoint between training/validation set and test set � Temporal order: Use sliding/expanding windows in time series 7/13

Classical ML Models Try out basic models � Static models K-nearest-neighbor classifier Naïve Bayes classifiers & quadratic classifiers Support vector classifiers Random forests & XGBoost � Dynamic models HMM (hidden Markov models) RNN (recurrent neural networks) LSTM (long short-term memory) Check feature feasibility � Training vs. validation accuracy 8/13

Performance Metrics Possible performance metrics (aka figure of merit) � Classification Recognition rate Precision, recall, and f-measure AUC (area under curves, for binary classifier mostly) ROC (receiver operating curve) PRC (precision-recall curve) Cross-entropy … � Regression RMSE (root mean square error) SSE (sum of squared error) R-squared … 9/13

Data Size Reduction Dimensionality reduction � Feature selection Sequential forward selection Exhaustive search Some models have built-in capability for feature selection, such as random forest and XGBoost. � Feature transformation Principal component analysis (PCA) Linear discriminant analysis (LDA) PCA+LDA Data count reduction � Data condensing � Data editing 10/13

Avoidance of Overfitting Model structure identification (aka model complexity estimation) � No. of clusters in k-means clustering for KNNC � No. of mixtures in GMM Regression � No. of hidden layers in NN Regularization � Use regularization term in the objective function For instance, L-2 norm of weights in NN � Drop-out in NN Classification 11/13

Summary: SOP for ML Data processing � Dataset collection � Classifier Missing value imputation Structure id Parameter id � Visualization � Objective function Outlier removal Feature normalization Feature engineering � Feature selection � Feature transformation Model construction � Performance evaluation Error analysis Confusion matrix DET/ROC plots PR plot 12/13

One Step Further Auto. ML � Machine learning for automated algorithm design Keys � Transfer learning � Learning to learn � Hyper-parameter tuning Resources � ML 4 AAD � Google’s Cloud Auto. ML Official site Media coverage 13/13