In order to overcome over-fitting during a regression process over categorical features, one can either
1) Apply L1/L2/Elastic regularization during the regression, for example as answered here
When to use regularization methods for regression?
2) Filter out categories which appear less than X times, where X is at least 10-20 ,for example
Minimum number of observations for multiple linear regression
The advantage in method 2 is that I can set with this rule of thumb approach the value of X to be 10-20, while the regularization method requires a costly computational cross validation step to be able to choose properly the correct regularization parameter, which would in fact multiply the regression running time by a factor. In addition, it greatly reduces the number of features when the categories have a long tail distribution – This can mean a big decrease in running time and memory consumption.
What is the common practice? When should I prefer one method over the other? Is the second method an acceptable one, or is the correct way to always perform some sort of regularization?