Data Exploration and Description and Probability

1 Faculty Salaries Data from the HR department at a small public university were collected in an eort to examine the salaries of faculty members. Of particular concern was the dierences in male and female faculty salaries. The spreadsheet faculty salaries.xlsx contains the data that were collected. The data include variables for gender (M is male, F is female), faculty rank, number of years in rank, nine-month (base) salary, and whether the faculty member had a terminal degree or not (1 is yes, 0 is no). Write a report that describes salaries, and determines whether the dierences in male and female salaries is of concern. 2 Economic Development Five hundred households in a middle-class neighborhood were recently surveyed as part of a study conducted by the local economic development district. Specically, for each of the 500 randomly selected households, the survey requested information on the following variables: family size, approximate locations of the household within the neighborhood, an indication of whether those surveyed owned or rented their home, gross annual income of the rst household wage earner, gross annual income of the second household wage earner (if applicable), monthly home mortgage or rent payment, average monthly expenditure on utilities, and the total indebtedness (excluding the value of a home mortgage) of the household. The data are in econ development.xls. As part of the study, you are to use these data to formulate a prole of the typical household residing in each of the four neighborhood locations. How do the typical households dier by location? 1 3 Mutual funds As an analyst for a nancial advisors group, you are responsible for providing statistics about mutual funds that will by used by advisors to help inform clients about how a fund's objectives might aect the returns of the fund. The le mutual fund returns.xls contains data on a crosssection of 868 dierent funds. Write a report that will be read by the rm's clients that describe the dierences between growth and value funds. 4 Target marketing Of your customers, 24% have high income, 17% are well educated. Furthermore, 12% are both high income and well educated. What does this information tell you about a marketing eort that is currently reaching well-educated people though you would really prefer to target highincome people? 5 Effects of smoking You work for the American Cancer Society, which has just received access to a major longitudinal study of the eects of smoking. The study is based on a random sample of U.S. women taken twenty years ago, and asked several questions, one of which whether the person smoked or not. As a longitudinal study, it was determined whether these same women were alive or not twenty years later. An intern has been working on the data, in anticipation of writing a report that will be distributed to the news media. She has worked up the following preliminary table: Survival Status Smoker Alive Dead Total Yes 443 139 582 No 502 230 732 Total 945 369 1314 That is, Smoker is whether the person was a smoker when interviewed in the rst wave of the longitudinal survey twenty years ago, and Survival Status is whether the person was dead or alive twenty years later. From this table, the intern then calculates the following probabilities: P(Dead|Smoker) = 139 582 = 0.2388 P(Dead|Smoker0 ) = 230 732 = 0.3142 The intern has carried these results to your supervisor, who glances at them and says There's obviously something wrong with these calculations, since you are more likely to die within twenty years if you are not a smoker. He hands the results to you, along with the spreadsheet smoking.xlsx, and says Here, x this, and then write that section of the report.