Linear population projections for Phoenix and Tempe using Excel
Assignment Overview
This assignment gives you the chance to practice calculating linear population projections for Phoenix and Tempe using Excel. Your Excel spreadsheet is due on Blackboard Monday night 4/2 by midnight.
Assignment Steps
Open the Excel file and examine its contents. There are two tabs – one for each city. In each tab, you’ll see 7 columns labeled. Four of them have data in them, and the one on the far right is MAG’s official population projections. You’re going to be filling in the other three with your own projections!
Follow the steps outlined below to calculate the linear population projections for each city from 2016 through 2050. Then, make sure you provide written answers to the three questions at the bottom of each spreadsheet tab.
There are two ways to obtain single-variable regression results in Excel. The first is to graph your data on a scatterplot, and ask Excel to add a trend line and its equation. The second is to use the regression function in Excel – which can be used to obtain regression results when you have multiple independent variables as well. In this lab, you will practice each.
PHOENIX: USING SCATTERPLOTS AND TRENDLINES
On the Phoenix tab, you will first ask Excel to estimate the best fit regression line for all of the included data years (1950-2015). To do so using the graphing method:
1. Highlight the relevant data in the “Year” column and the “Population (in 1000’s)” column.
2. Go to the “Insert” menu, and insert an “XY (Scatter)” chart. The icon for this looks like lots of multicolored dots in the graph.
3. Resize the inserted chart so it’s big enough to see easily, and change the title to indicate that this is the chart based on the data for years 1950-2015.
4. Now select the points that are displaying the data on the graph, right-click (control-click on the Mac), and select “Add Trendline…”.
5. A new menu will open up on the right of the screen, with a whole bunch of options. Choose “Linear” from the first set of choices (this is the default), type “35” in the box next to “Forecast, Forward”, and select the checkboxes for both “Display equation on chart” and “Display R-squared value on chart”.
6. Select the equation textbox in the chart, and increase the font size so it is readable. This is the equation of the best fit regression line for your data!
7. The “Forecast, Forward” option already should have projected the line on the chart to 2050, but you want to get the numbers that go along with this. To do so, use the equation for the best fit regression line to obtain the projected population numbers. Specifically, enter the equation into the Excel cell D24 (where “x” in the equation is the future year, and “y” is the projected population), and copy that down through Excel cell D58. To be clear, the equation is this:
Population = (number)*Year + (number: this one will be negative)
Now you have a linear projection for the Phoenix population through 2050 based on historical population data from 1950-2015. Great job!
To complete the Phoenix tab, follow the same steps to create graphical and numeric projections based on historical population data from 2000-2015 (column E), and based on historical population data from 2010-2015 (column F). Look at the results, compare them to MAG’s projections in column G, and write the answer to the question at the bottom of the tab.
TEMPE: USING EXCEL’S REGRESSION FUNCTION
On the Tempe tab, you will practice using Excel’s regression function to do the same task. To do this, you first need to activate the Analysis Toolpak. Go to Tools -> Excel Add-ins, and click the box next to “Analysis Toolpak”. This activates a new menu option under Tools – “Data Analysis…”.
To use Excel’s regression function, follow these steps:
1. Choose Tools ->Data Analysis, and a pop-up window will appear. From this window, now choose “Regression”. You’ll notice a lot of other tools there as well – feel free to check them out!
2. Now a Regression pop-up window should have opened. Choose the “Year” column of data as the X data (and don’t include the cell with the word “Year” in it), and choose the “Population (in 1000’s)” column as the Y data. Choose the option to add a “Line fit plot” to the output, and put the output in a new worksheet.
3. Examine what you see in the new worksheet that gets created with your regression results. At the top there are some “Regression Statistics” including the “R-squared”, “Observations”, etc. Then there is a section called “ANOVA”, which you don’t need to worry about for this class. The next unlabeled section is the regression coefficients and associated statistical test results that tell us if each of the estimated coefficients is likely to actually be zero. The “Coefficients” column indicates the slope and intercept of the best fit regression line through your data. Below that, you’ll see the “Residuals” output. Sometimes this can be useful to see where your best fit line actually fits the data well, and where it doesn’t. For this assignment, you don’t need to worry about the residuals.
4. To convince yourself that this is the same as what you did with the Phoenix data, select the series of dots in the scatterplot that are the original data, and use the steps above to add a trendline and equation. The slope and intercept of the trendline equation should exactly match the “Coefficients” column at left.
5. Finally, go back to your Tempe tab, and use the regression coefficients to calculate the projected population for years 2016-2050.
Now you have a linear projection for the Tempe population through 2050 based on historical population data from 1950-2015. Great job!
To complete the Tempe tab, follow these same steps to run regressions to create graphical and numeric projections based on historical population data from 2000-2015 (column E), and based on historical population data from 2010-2015 (column F). Look at the results, compare them to MAG’s projections in column G, think a bit, and write answers to the questions at the bottom of the tab.
Your final assignment should include five tabs: Phoenix (with three embedded graphs and three answered questions), Tempe (with one answered question), and three appropriately labeled tabs for each of the three Tempe data regressions you ran.