DATAMINING Artificial Neural Networks Alexey Minin Jass 2006

  • Slides: 20
Download presentation
DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006

DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006

Teaching without the tutor: introduction ANN forms it’s output itself, according to the information,

Teaching without the tutor: introduction ANN forms it’s output itself, according to the information, presented for input. We have to minimize some functional. After we have found this functional we have to minimize it. It is the main task, and according to this functional the input vector will be changed. In practice, adaptive networks code input information in the most compact way, of course according to some predefined requirements.

Teaching without the tutor: redundancy of data The length of data description: Dimension of

Teaching without the tutor: redundancy of data The length of data description: Dimension of data = number of components of input vector Capacity of data = number of bits, defining the possible variety of all values Two ways of coding (reducing) the information Reducing the dimension of data with min loss finding of independent features Reducing the variety of data by detecting the prototypes Clustering and quantifying

Two ways to reduce the data Reducing the dimension allows us to describe the

Two ways to reduce the data Reducing the dimension allows us to describe the data with less components Clustering allows us to reduce the variety of data, reducing the number of bits, we need to describe the data. NB We can unite both types of algorithms. We can use Kohonen maps, when prototypes regulate in the space of low dimension. For example, input data can be reflected on to 2 -dimensional grid of prototypes the way, you can visualize the data you have.

Main idea: neuron - indicator Neuron has one output and it’s teaching upon a

Main idea: neuron - indicator Neuron has one output and it’s teaching upon a d-dimension data Lets say that the activation function is linear. The output therefore is the linear combination of it’s outputs: The amplitude after the training is finished can be the indicator for the data. Showing rather the data corresponds for training patterns or not.

Hebb training algorithm According to Hebb: If we will reformulate the task as the

Hebb training algorithm According to Hebb: If we will reformulate the task as the optimization task we will get the property of such neuron and rule how to define functional we have to min: NB! If we wont to have minimum of the E than we will have an output amplitude equals to infinity

Oja training rule The member interfering was added to stop unlimited growth of weights

Oja training rule The member interfering was added to stop unlimited growth of weights Rule Oja maximizes sensitivity of an output neuron at the limited amplitude of weights. It is easy to be convinced of it, having equated average change of weights to zero. Having increased then the right part of equality on w. We are convinced, that in balance Thus, weights of trained neuron are located on hyper sphere : At training on Oja, a vector of weights settles down on hyper sphere, In a direction maximizing Projection of input vectors.

Oja training rule SUMMARY: Neuron is trying to reproduce the value of it’s input

Oja training rule SUMMARY: Neuron is trying to reproduce the value of it’s input for known output. It means that it’s trying to maximize the sensitivity of it’s output neurons-indicators for many dimensional input information, doing compression this way. NB! The output of the Oja output layer is the linear combination of main components. If you want to receive main components you should change sum of all outputs:

The analysis of main components Lets say that we have d-dimensional data we are

The analysis of main components Lets say that we have d-dimensional data we are training m linear neurons: . THE TASK IS: We want an amplitude to be independent indicators of all output neurons, fully reflecting information about many-dimensional data we have.

The requirement: u Neurons must interact somehow (if we will train them independently we

The requirement: u Neurons must interact somehow (if we will train them independently we will receive the same result for all of them) In simple case: Lets take perceptron with linear neuron for hidden layer, in which the number of inputs and outputs equals, and the weights with the same indexes in both layers are the same. Lets try to teach ANN to reproduce the input on the output. Training rule therefore: Looks like Oya training rule!

Self training layer: In our formulation the training of separate neuron, is trying to

Self training layer: In our formulation the training of separate neuron, is trying to reproduce the inputs according to its outputs. Generalizing this note, it is logical to suggest a rule, according to which the value of outputs restoring according to whole output information. Doing this way we can get Oja training rule for one layer network: The hidden layer of such ANN, the same as Oya layer, makes optimal coding of input data, and contains maximum variety of data according to existing restrictions.

Example: Lets change activation function on the sigmoid in the training rule: Brings new

Example: Lets change activation function on the sigmoid in the training rule: Brings new property (Oja, et al, 1991). Such algorithm, in particular, was used for the decomposition of mixed signals with an unknown way (i. e. blind signal separation). For example this task we have when we want to separate human voice and noise.

Competition of neurons: the winner gets all Basis algorithm The training of competition layer

Competition of neurons: the winner gets all Basis algorithm The training of competition layer remains constant: The winner: # of neuron winner The winner will be the neuron, which has the maximum response Training of winner:

The winner takes away not all One of variants of updating of a base

The winner takes away not all One of variants of updating of a base rule of training of a competitive layer Consists in training not only the neuron-winner, but also its "neighbors", though and with In the smaller speed. Such approach - "pulling up" of the nearest to the winner neuron. It is applied in topographical Kohonen cards Function of the neighborhood is equal to unit for the neuron-winner with an index And gradually falls down at removal from the neuron-winner Training on Kohonen reminds stretching an elastic grid of prototypes on Data file from training sample

Methodology of self-organizing cards Schematic representation of self-organizing network Training on Kohonen reminds stretching

Methodology of self-organizing cards Schematic representation of self-organizing network Training on Kohonen reminds stretching an elastic grid of prototypes on Data file from training sample Neurons in the target layer are ordered and correspond to cells of a bi-dimensional card which can be painted by a principle of affinity of attributes

Visualization a topographical card, Induced by i-th component of entrance data The convenient tool

Visualization a topographical card, Induced by i-th component of entrance data The convenient tool of visualization Data is coloring topographical Cards, it is similar to how it do on Usual geographical cards. All attribute of data generates the coloring Cells of a card - on size of average value This attribute at the data who have got in given Cell. Having collected together cards of all interesting Us of attributes, we shall receive topographical The atlas, giving integrated representation About structure of multivariate data.

Methodology of self-organizing cards Classified SOM for NASDAQ 100 index for the period from

Methodology of self-organizing cards Classified SOM for NASDAQ 100 index for the period from 10 -Nov-1997 till 27 -Aug-2001

Complexity of the algorithm When it’s better to use reducing of dimension, and when

Complexity of the algorithm When it’s better to use reducing of dimension, and when – quantifying of the input information? number of syn weights of 1 layer ANN with d inputs & m output neurons Reducing the dim Number of training patterns quantifying # of operations: Compression coef: Compression coef (b – capacity data) Complexity: With the same compression coef:

JPEG example Image is divided on to 8 x 8 pixels, which should be

JPEG example Image is divided on to 8 x 8 pixels, which should be input vectors, we want to reduce. In our case Lets propose that image contains But if d=64 x 64 than K>103 gradation of the gray accuracy of the represented data

Any questions?

Any questions?