2023年11月7日星期二

Multiple linear regression (2)

Conducting multiple regression analysis requires a fair amount of pre-work before actually running the regression:

1. Generate a list of potential variables, independents and dependent;

2. Collect data on the variables

3. Check the relationships between each independent variable and dependent variable using scatterplots and correlations

4. Check the relationships among the independent variables using scatterplots and correlations.

5. (Optional) Conduct simple linear regressions for each IV/DV pair

6. Use the non-redundant independent variables in the analysis to find the best fitting model.

7. Use the best fitting model to make predictions about the dependent variable.










One way to measure multicollinearity is the variance inflation factor (VIF) , which assesses how much the variance of an estimated regression coefficient increases if your predictors are correlated. If no factors are correlated, the VIFs will all be 1. A VIF between 5 and 10 indicates high correlation that may be problematic. And if the VIF goes above 10, you can assume that the regression coefficients are poorly estimated due to multicollinearity.










Examine the difference between R-Sq(adj) and R-Sq(pred). A large drop-off indicates overfitting--too many variables in the model.


没有评论:

发表评论

85年前的4月,他写了卡利古拉。

  所以首先要闭上嘴巴——不要观众了,学着自我评判。专注保养身体之余亦不忘追求人生的意义。放下一切身段,致力于一种双重的解放——对于金钱以及对于自己的虚荣和怯懦。生活要有规律。花两年时间来想通一件事其实不算浪费人生。要把之前那些习惯改掉,先全心全力地汲取教训,然后再耐心地去学习。...