Data Exploration and Description and Probability
1 Faculty Salaries
Data from the HR department at a small public university were collected in an eort to examine
the salaries of faculty members. Of particular concern was the dierences in male and female
faculty salaries. The spreadsheet faculty salaries.xlsx contains the data that were collected. The
data include variables for gender (M is male, F is female), faculty rank, number of years in rank,
nine-month (base) salary, and whether the faculty member had a terminal degree or not (1 is
yes, 0 is no). Write a report that describes salaries, and determines whether the dierences in
male and female salaries is of concern.
2 Economic Development
Five hundred households in a middle-class neighborhood were recently surveyed as part of a study
conducted by the local economic development district. Specically, for each of the 500 randomly
selected households, the survey requested information on the following variables: family size,
approximate locations of the household within the neighborhood, an indication of whether those
surveyed owned or rented their home, gross annual income of the rst household wage earner,
gross annual income of the second household wage earner (if applicable), monthly home mortgage
or rent payment, average monthly expenditure on utilities, and the total indebtedness (excluding
the value of a home mortgage) of the household. The data are in econ development.xls. As part
of the study, you are to use these data to formulate a prole of the typical household residing
in each of the four neighborhood locations. How do the typical households dier by location?
1
3 Mutual funds
As an analyst for a nancial advisors group, you are responsible for providing statistics about
mutual funds that will by used by advisors to help inform clients about how a fund's objectives
might aect the returns of the fund. The le mutual fund returns.xls contains data on a crosssection
of 868 dierent funds. Write a report that will be read by the rm's clients that describe
the dierences between growth and value funds.
4 Target marketing
Of your customers, 24% have high income, 17% are well educated. Furthermore, 12% are both
high income and well educated. What does this information tell you about a marketing eort
that is currently reaching well-educated people though you would really prefer to target highincome
people?
5 Effects of smoking
You work for the American Cancer Society, which has just received access to a major longitudinal
study of the eects of smoking. The study is based on a random sample of U.S. women taken
twenty years ago, and asked several questions, one of which whether the person smoked or not.
As a longitudinal study, it was determined whether these same women were alive or not twenty
years later. An intern has been working on the data, in anticipation of writing a report that will
be distributed to the news media. She has worked up the following preliminary table:
Survival Status
Smoker Alive Dead Total
Yes 443 139 582
No 502 230 732
Total 945 369 1314
That is, Smoker is whether the person was a smoker when interviewed in the rst wave of the
longitudinal survey twenty years ago, and Survival Status is whether the person was dead or
alive twenty years later. From this table, the intern then calculates the following probabilities:
P(Dead|Smoker) = 139
582
= 0.2388
P(Dead|Smoker0
) = 230
732
= 0.3142
The intern has carried these results to your supervisor, who glances at them and says There's
obviously something wrong with these calculations, since you are more likely to die within
twenty years if you are not a smoker. He hands the results to you, along with the spreadsheet
smoking.xlsx, and says Here, x this, and then write that section of the report.