Quantcast
Channel: Question and Answer » regularization
Viewing all articles
Browse latest Browse all 33

Why do smaller weights result in simpler models in regularization?

$
0
0

I completed Andrew Ng’s Machine Learning course around a year ago, and am now writing my High School Math exploration on the workings of Logistic Regression and techniques to optimize on performance. One of these techniques is, of course, regularization.

The aim of regularization is to prevent overfitting by extending the cost function to include the goal of model simplicity. We can achieve this by penalizing the size of weights by adding to the cost function each of the weights squared, multiplied by some regularization paramater.

Now, the Machine Learning algorithm will aim to reduce the size of the weights whilst retaining the accuracy on the training set. The idea is that we will reach some point in the middle where we can produce a model that generalizes on the data and does not try to fit in all the stochastic noise by being less complex.

My confusion is why we penalize the size of the weights? Why do larger weights create more complex models, and smaller weights create simpler/smoother models? Andrew Ng claims in his lecture that the explanation is a difficult one to teach, but I guess I am looking for this explanation now.

Prof. Ng did indeed give an example of how the new cost function may cause the weights of features (ie. x^3 and x^4) to tend towards zero so that the model’s degree is reduced, but this does not create a complete explanation.

My intuition is that smaller weights will tend to be more “acceptable” on features with greater exponents than ones with smaller exponents (because the features with small weights are like the basis of the function). Smaller weights imply smaller “contributions” to the features with high order. But this intuition is not very concrete.


Viewing all articles
Browse latest Browse all 33

Trending Articles