Linear Regression Gaussian noise about y 4 x

Linear Regression • Gaussian noise about y = 4 x + 3 • Cost function = mean square error (RMS also ok, but more steps) • Analytic fit based on normal equation: y = 4. 203 x + 2. 911 • But we can make this more complicated.

(See Jupyter notebook on linear regression. )

Gradient Descent • Cost function gradient in feature space • Sample gradient, step “down”, repeat • eta sets step size • Feature scale should be ~uniform. i. e. small partial derivatives along any axis will increase the number of steps you need to converge.

Many approaches to gradient descent. • Batch gradient descent • Every iteration: calculate gradient using all data • Slow & “reliable” • Stochastic gradient descent • Every iteration, calculate gradient at one point in feature space. • Fast & noisy • Can tweak learning schedule (cost function per iteration) to do interesting things. • Let’s look at the notebook.

Logistic Regression

Logistic Regression • Based on model of neurons firing • Estimate probability that x is in some binary class. • x 0 as bias: how positive does x have to be before neuron fires?

Logistic Regression • Example cost function for logistic regression • log → probabilities near zero are very bad. • No analytic solution to the minimum of this cost function but: • convex function • can try GD

Binary classification with logistic regression (see Jupyter Notebook)

Virginica Versicolo r Setosa