Machine Learning Neural Networks Slides mostly adapted from
- Slides: 41
Machine Learning Neural Networks Slides mostly adapted from Tom Mithcell, Han and Kamber
Artificial Neural Networks l Computational models inspired by the human brain: l Algorithms that try to mimic the brain. l Massively parallel, distributed system, made up of simple processing units (neurons) l Synaptic connection strengths among neurons are used to store the acquired knowledge. l Knowledge is acquired by the network from its environment through a learning process
History l l late-1800's - Neural Networks appear as an analogy to biological systems 1960's and 70's – Simple neural networks appear l l Fall out of favor because the perceptron is not effective by itself, and there were no good algorithms for multilayer nets 1986 – Backpropagation algorithm appears l l Neural Networks have a resurgence in popularity More computationally expensive
Applications of ANNs l ANNs have been widely used in various domains for: l l l Pattern recognition Function approximation Associative memory
Properties l Inputs are flexible l l l Target function may be discrete-valued, real-valued, or vectors of discrete or real values l l l any real values Highly correlated or independent Outputs are real numbers between 0 and 1 Resistant to errors in the training data Long training time Fast evaluation The function produced can be difficult for humans to interpret
When to consider neural networks Input is high-dimensional discrete or raw-valued l Output is discrete or real-valued l Output is a vector of values l Possibly noisy data l Form of target function is unknown l Human readability of the result is not important Examples: l Speech phoneme recognition l Image classification l Financial prediction l
A Neuron (= a perceptron) - t x 0 w 0 x 1 w 1 xn f wn Input weight vector x vector w l å weighted sum output y Activation function The n-dimensional input vector x is mapped into variable y by means of the scalar product and a nonlinear function mapping 24 November 2020 Data Mining: Concepts and Techniques 7
Perceptron l l l Basic unit in a neural network Linear separator Parts l l l N inputs, x 1. . . xn Weights for each input, w 1. . . wn A bias input x 0 (constant) and associated weight w 0 Weighted sum of inputs, y = w 0 x 0 + w 1 x 1 +. . . + wnxn A threshold function or activation function, l i. e 1 if y > t, -1 if y <= t
Artificial Neural Networks (ANN) l Model is an assembly of inter-connected nodes and weighted links l Output node sums up each of its input value according to the weights of its links l Compare output node against some threshold t Perceptron Model or
Types of connectivity l Feedforward networks l l l These compute a series of transformations Typically, the first layer is the input and the last layer is the output. Recurrent networks l l These have directed cycles in their connection graph. They can have complicated dynamics. More biologically realistic. output units hidden units input units
Different Network Topologies l Single layer feed-forward networks l Input layer projecting into the output layer Single layer network Input layer Output layer
Different Network Topologies l Multi-layer feed-forward networks l One or more hidden layers. Input projects only from previous layers onto a layer. 2 -layer or 1 -hidden layer fully connected network Input layer Hidden layer Output layer
Different Network Topologies l Multi-layer feed-forward networks Input layer Hidden layers Output layer
Different Network Topologies l Recurrent networks l A network with feedback, where some of its inputs are connected to some of its outputs (discrete time). Recurrent network Input layer Output layer
Algorithm for learning ANN l Initialize the weights (w 0, w 1, …, wk) l Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples l Error function: l Find the weights wi’s that minimize the above error function l e. g. , gradient descent, backpropagation algorithm
Optimizing concave/convex function Maximum of a concave function = minimum of a convex function Gradient ascent (concave) / Gradient descent (convex) l Gradient ascent rule
Decision surface of a perceptron l Decision surface is a hyperplane l l Can capture linearly separable classes Non-linearly separable l Use a network of them
Multi-layer Networks l Linear units inappropriate l l �Introduce non-linearity l l No more expressive than a single layer Threshold not differentiable �Use sigmoid function
Backpropagation l Iteratively process a set of training tuples & compare the network's prediction with the actual known target value l For each training tuple, the weights are modified to minimize the mean squared error between the network's prediction and the actual target value l Modifications are made in the “backwards” direction: from the output layer, through each hidden layer down to the first hidden layer, hence “backpropagation” l Steps l Initialize weights (to small random #s) and biases in the network l Propagate the inputs forward (by applying activation function) l Backpropagate the error (by updating weights and biases) l Terminating condition (when error is very small, etc. ) 24 November 2020 Data Mining: Concepts and Techniques 31
How A Multi-Layer Neural Network Works? l The inputs to the network correspond to the attributes measured for each training tuple l Inputs are fed simultaneously into the units making up the input layer l They are then weighted and fed simultaneously to a hidden layer l The number of hidden layers is arbitrary, although usually one l The weighted outputs of the last hidden layer are input to units making up the output layer, which emits the network's prediction l The network is feed-forward in that none of the weights cycles back to an input unit or to an output unit of a previous layer l From a statistical point of view, networks perform nonlinear regression: Given enough hidden units and enough training samples, they can closely approximate any function 24 November 2020 Data Mining: Concepts and Techniques 33
Defining a Network Topology l First decide the network topology: # of units in the input layer, # of hidden layers (if > 1), # of units in each hidden layer, and # of units in the output layer l Normalizing the input values for each attribute measured in the training tuples to [0. 0— 1. 0] l One input unit per domain value, each initialized to 0 l Output, if for classification and more than two classes, one output unit per class is used l Once a network has been trained and its accuracy is unacceptable, repeat the training process with a different network topology or a different set of initial weights 24 November 2020 Data Mining: Concepts and Techniques 34
Backpropagation and Interpretability l Efficiency of backpropagation: Each epoch (one interation through the training set) takes O(|D| * w), with |D| tuples and w weights, but # of epochs can be exponential to n, the number of inputs, in the worst case l Rule extraction from networks: network pruning l Simplify the network structure by removing weighted links that have the least effect on the trained network l Then perform link, unit, or activation value clustering l The set of input and activation values are studied to derive rules describing the relationship between the input and hidden unit layers l Sensitivity analysis: assess the impact that a given input variable has on a network output. The knowledge gained from this analysis can be represented in rules 24 November 2020 Data Mining: Concepts and Techniques 35
Neural Network as a Classifier l Weakness l l Long training time Require a number of parameters typically best determined empirically, e. g. , the network topology or “structure. ” Poor interpretability: Difficult to interpret the symbolic meaning behind the learned weights and of “hidden units” in the network Strength l l l High tolerance to noisy data Ability to classify untrained patterns Well-suited for continuous-valued inputs and outputs Successful on a wide array of real-world data Algorithms are inherently parallel Techniques have recently been developed for the extraction of rules from trained neural networks 24 November 2020 Data Mining: Concepts and Techniques 36
Artificial Neural Networks (ANN)
Learning Perceptrons
A Multi-Layer Feed-Forward Neural Network Output vector Output layer Hidden layer wij Input layer Input vector: X 24 November 2020 Data Mining: Concepts and Techniques 40
General Structure of ANN Training ANN means learning the weights of the neurons
- Least mean square algorithm in neural network
- Few shot learning with graph neural networks
- Neural networks and learning machines
- Machine learning introduction slides
- Introduction to machine learning slides
- Machine learning lecture notes
- Visualizing and understanding convolutional networks
- Liran szlak
- Formation of neural networks ib psychology
- Audio super resolution using neural networks
- Convolutional neural networks for visual recognition
- Style transfer
- Efficient processing of deep neural networks
- Mippers
- Convolutional neural network presentation
- Lstm components
- Matlab u-net
- Neural networks for rf and microwave design
- 11-747 neural networks for nlp
- Neural networks simon haykin
- Csrmm
- On the computational efficiency of training neural networks
- Tlu neural network
- Fuzzy logic lecture
- Netinsights
- Convolutional neural networks
- Deep forest: towards an alternative to deep neural networks
- Convolutional neural networks
- Neuraltools neural networks
- Andrew ng rnn
- Predicting nba games using neural networks
- The wake-sleep algorithm for unsupervised neural networks
- Bharath subramanyam
- Alternatives to convolutional neural networks
- A small child slides down the four frictionless slides
- Energy conservation quick check
- Virtual circuit network uses
- Basestore iptv
- Concept learning task in machine learning
- Analytical learning in machine learning
- Pac learning model in machine learning
- Machine learning t mitchell