A cost estimator for a construction company has collected the data found in the source file Estimation.xlsx describing the total cost (Y) of 97 difference projects and the following 3 independent variables thought to exert relevant influence on the total cost: total units of work required (X1), contracted units of work per day (X2), and city/location of work (X3). The cost estimator would like to develop a regression model to predict the total cost of a project as a function of these 3 independent variables.
a. Prepare two scatter plots showing the relationship between the total cost of the projects and each of the two independent variables (X1, and X2). What sort of relationship does each plot suggest? After data analysis, you can find any cell in Excel to write down your interpretation.
b. Suppose the estimator wants to use the total units of work required (X1), contracted units of work per day (X2), and city/location of work (X3) as the independent variables to predict total cost. What should be the regression function between Y and X1, X2, and X3? What is the adjusted R-squared value of this model? What conclusions can you make? (Note that X3 is a dummy variable. You should process it into different categories as I showed you in the class lecture. You should expand X3 into Location1, Location 2, …. Location 5 to differentiate the six locations.) After data analysis, you can find any cell in Excel to write down your interpretation.