• Identify a topic with dataset suitable for linear regression models, which contains both numerical and categorical variables
• Come up with a feasible hypothesis you are trying to address • Provide comprehensive numerical and graphical summary of the data, and analysis of outliers • Analyze and compare different approaches of building a linear regression model, e.g. : feature selection, parameter estimation methods for model coefficients, variable transformations. Use various goodness-of-fit measures you learned throughout the course in your analysis to identify the ‘best fitted’ model for your problem. • Assess the predictive performance of your models using cross-validation and compare that with the goodness of fit. • Submit your work in a detailed write-up that summarized your approach, findings and conclusion of the problem. Include R code either as an appendix or exhibits
Sample Solution