Learning What is learning Supervised Learning Training data

Supervised Learning • Training data that has been classified • Examples – – –

Rote Learning • Store training data • Limitation - does not extend beyond what

Concept Learning • Inductive learning with generalization • Given training data: – tuples <a

• A hypothesis if a tuple that is true • hg <? ,

Training Method • Use the lattice to generate the most general hypothesis • Weakness

Decision Trees • ID 3 algorithm • Entropy: a measure of information – p(I)log

• Gain is a measure of the effectiveness of a attribute • Gain(S,

ID 3 • Greedy algorithm • Select the attribute by the largest gain •

Markov Models • Markov Chain is a set of states • State transitions are

Example from the Text • Given a set of words, Markov chain to generate

Nearest Neighbor • 1 NN: • Use vectors to represent entities • Use distance

k. NN - better • Use k closest neighbors and vote

Other techniques Yet to cover! • Evolutionary algorithms • Neural nets

Slides: 16

Download presentation

Learning

What is learning?

Supervised Learning • Training data that has been classified • Examples – – – Concept learning Decision trees Markov models Nearest neighbor Neural Nets (in coming weeks) • Inductive Bias - limits imposed by assumptions! – Especially what factors we choose as inputs

Rote Learning • Store training data • Limitation - does not extend beyond what has been seen • Example: concept learning

Concept Learning • Inductive learning with generalization • Given training data: – tuples <a 1, a 2, a 3, …> – Boolean value – ai - can be any value – ? Is used for a don’t care positive – Null is used for don’t care negative

• A hypothesis if a tuple that is true • hg <? , …. > - most general - always true • hs <null, …> most specific always false • hg >= hs • Defines a partially ordered lattice

Training Method • Use the lattice to generate the most general hypothesis • Weakness – Inconsistent data – Data errors

Decision Trees • ID 3 algorithm • Entropy: a measure of information – p(I)log 2 p(I) entropy of an element • Entropy of the system of information: – Sum - p(I)log 2 p(I) – P(I) is instances of I / total instances – This is done over the outputs of the tree

• Gain is a measure of the effectiveness of a attribute • Gain(S, A) = Entropy(S) - S ((|Sv| / |S|) * Entropy(Sv)) – Sv Number of outputs with value v is attribute – S is the number of elements in the outputs

ID 3 • Greedy algorithm • Select the attribute by the largest gain • Iterate until done

Markov Models • Markov Chain is a set of states • State transitions are probabilistic • State xi goes to state xj with P(xj | xi) • This can be extended to allow the probability to depend on a set of past states (Memory)

Example from the Text • Given a set of words, Markov chain to generate similar words • For each letter position of the words, compute probability • Use a matrix of counts – Count[from][to] • Normalize rows by total count in row

Nearest Neighbor • 1 NN: • Use vectors to represent entities • Use distance measure between vectors to locate closest known entity • Can be effected by noisy data

k. NN - better • Use k closest neighbors and vote

Other techniques Yet to cover! • Evolutionary algorithms • Neural nets