Range of lambda in elastic net regression
$defl{|!|}$ Given the elastic net regression $$min_b frac{1}{2}l y – Xb l^2 + alphalambda l bl_2^2 + (1 – alpha) lambda l bl_1$$ how can an appropriate range of $lambda$ be chosen for cross-validation?...
View ArticleHow to understand singularities in physics?
The question is probably two-folded and I will try not to make it too vague, but nonetheless the question remains general. First fold: In most physical laws, that we have analytic mathematical...
View ArticleWhy IR divergences cancel by cross sections of next-to-leading diagrams?
I was reading QFT & Standard Model by Schwartz, Chapter 20 which is about IR divergences. He says that IR divergences only cancel cross sections for processes involving different initial or final...
View ArticleEvaluating integral to obtain marginal PDF related to Tikhonov Regularization
I am attempting to derive the marginal PDF for an application of the Gibbs Sampler. My joint PDF contains: $P(b,x) = frac{1}{sigma^{n}}exp left( -frac{1}{2sigma^2}leftlVert...
View ArticlePauli Villars Regularization
Consider the t-channel diagram of phi-4 one loop diagrams. Evaluated it is, with loop momenta p, $frac{lambda^2}{2}displaystyleintfrac{d^4p}{(2pi)^4}frac{1}{(p+q)^2-m^2}frac{1}{p^2-m^2}$ If I want to...
View ArticlePoint splitting technique in Pesking and Schroeder
One of the cornerstones of point splitting technique of calculating chiral anomaly (Peskin and Schroeder 19.1, p.655) is a symmetric limit $epsilon rightarrow 0$. And this is the point that I don’t...
View ArticleRenormalization Using Momentum Cut-off Regularization, What Are The...
In most of the books on QFT, the author talks about various methods of regularization but in the end chooses the dimensional regularization and MS-bar scheme when discussing the final renormalization,...
View ArticleEarly stopping for CNN to improve speed of training
I want to implement early stopping for my convolutional neural network. The main reason is that I want to test my CNN using various parameter settings and some of these may require more iterations than...
View ArticleDoubts with basic renormalization
When we renormalize to obtain the physical mass, the $Lambda$ dependence of the physical mass is removed by introducing the counterterms in the Lagrangian. So whether we put $Lambdarightarrowinfty$ or...
View ArticleRobustness to deviation from normality with regularized VAR model – references
I was listening to a talk where the presenter was talking about using regularized estimation approaches in a VAR(1) model $$X_t = Gamma X_{t-1} + epsilon_t, quad epsilon_t sim mathcal{N}(0,Omega).$$...
View ArticleProve the estimator $hat{B}$ of ridge regression = mean of the posterior...
I want to prove that the estimator of ridge regression is the mean of the posterior distribution under Gaussian prior. $$y sim N(Xbeta,sigma^2I),quad text{prior }beta sim N(0,gamma^2 I).$$ $$hat{beta}...
View ArticleWhy can't ridge regression provide better interpretability than LASSO?
I already have an idea about pros and cons of ridge regression and the LASSO. For the LASSO, L1 penalty term will yield a sparse coefficient vector, which can be viewed as a feature selection method....
View Articlethe components in the error in x in the damped least square problem
Could someone explain for me why the error in x in the damped least square problem has two components,one from the noise on b and an approximation error from tau.
View ArticleWhy is Lasso penalty equivalent to the double exponential (Laplace) prior?
I have read in a number of references that the Lasso estimate for the regression parameter vector $B$ is equivalent to the posterior mode of $B$ in which the prior distribution for each $B_i$ is a...
View ArticleWhat is elastic net regularization, and how does it solve the drawbacks of...
Is elastic net regularization always preferred to Lasso & Ridge since it seems to solve the drawbacks of these methods? What is the intuition and what is the math behind elastic net?
View ArticleWhy do smaller weights result in simpler models in regularization?
I completed Andrew Ng’s Machine Learning course around a year ago, and am now writing my High School Math exploration on the workings of Logistic Regression and techniques to optimize on performance....
View ArticleWhy do we need to normalize data before applying penalizing methods in the...
This question already has an answer here: Question about standardizing in ridge regression 1 answer
View ArticleChoosing alpha for cost complexity pruning as described in Introduction to...
In the following lectures Tree Methods, they describe a tree algorithm for cost complexity pruning on page 21. It says we apply cost complexity pruning to the large tree in order to obtain a sequence...
View ArticleRidge regression — why does the model only care to control large outliers?
One of the purposes of ridge regression is to curb the effects of outliers which may cause the regression coefficients to be so large and hence cause a highly biased model. That’s why the constraint...
View ArticleChoosing between feature selection and regularization to overcome...
In order to overcome over-fitting during a regression process over categorical features, one can either 1) Apply L1/L2/Elastic regularization during the regression, for example as answered here When to...
View Article