Machine Learning 101 Christina Brasco Outline What is
+ Machine Learning 101 Christina Brasco
Outline + • • What is a Neural Network? How do Artificial Neural Networks work? When do you use Neural Networks? Neural Network Walkthrough
+ What can you do with a neural network? Regression and Classification…
+ Image Processing and Recognition…
+ More Image Processing…
+ Terms to know n Feature: numerical or categorical descriptor of a data point (x 1, x 2, … xn) n Label: A categorical variable that describes group membership of a data point (y) n Function: A mathematical relationship between features and labels (y = f(x 1, x 2, … xn))
+ What is Machine Learning? n Machine learning is a field at the intersection of mathematics and computer science, in which algorithms are developed enabling computers to learn complex patterns without explicit programming Unupervised Learning Supervised Learning
+ Biological Neural Networks A biological neuron transmits information through electrochecmical signals received by its input synapses (dendrites) and sent out through its output synapses (axon terminals) In a biological neural network, information is transmitted electrochemically between neurons, and individual neurons learn by growing more synapses
+ Artificial Neural Networks An artificial neuron imitates the structure of a biological neuron An artificial neural network connects many artificial neurons, and imitates the structure of the brain
The human brain has roughly 86 billion + neurons, and 150 trillion functional connections! By comparison, the biggest artificial neural networks around today (Google, LLNL, Facebook, etc. ) have somewhere on the order of 10 billion functional connections
+ How do artificial neurons predict? y = f(sum) Sum = x 1 w 1+x 2 w 2 + … +xnwn + b
+ How do artificial neurons learn those weights? n A loss function, like the Mean Squared Error (MSE) is minimized to find the best possible solution n Since loss functions, like MSE, can be very complex and non-linear, we iteratively approach the minimum using Gradient Descent n GD is a greedy algorithm to find local minimum – eliminate bias by randomizing initial weights
+ Learning through Backpropagation n At each step, each weight is updated according to a learning coefficient (how quickly we want the network to learn) and the gradient of the loss function with respect to that weight, moving it slowly closer to its optimal function: Wi = Wi-1 – α∇ f(Wi-1) n This works from the end backwards through every layer of the neural network
+ Advantages/Disadvantages n Advantages of Neural Networks n n n Universal Function Approximator Usable for very complex problems (vision and text learning) This is a growing field, with a lot to learn! n Disadvantages of neural networks n Can seem like a “black box” n Computationally expensive n Prone to overfitting (fix: enough data) n Even experts don’t understand why some applications are possible
+ Walkthrough: Setup
+ Walkthrough: Feed-Forward
+ Walkthrough: Calculating Derivatives
+ Walkthrough: Gradient Descent
- Slides: 18