. Descriptive statistics are computed to reveal characteristics of the sample data set and to describe study variables
Copyright © 2017, Elsevier Inc. All rights reserved. 291 Calculatin" rel="nofollow">ing Descriptive Statistics There are two major classes of statistics: descriptive statistics and
in" rel="nofollow">inferential statistics. Descriptive statistics are computed to reveal characteristics of the sample data set and to describe study variables. Inferential statistics
are computed to gain" rel="nofollow">in in" rel="nofollow">information about effects and associations in" rel="nofollow">in the population bein" rel="nofollow">ing studied. For some types of studies, descriptive statistics will be the only
approach to analysis of the data. For other studies, descriptive statistics are the fi rst step in" rel="nofollow">in the data analysis process, to be followed by in" rel="nofollow">infer-ential statistics.
For all studies that in" rel="nofollow">involve numerical data, descriptive statistics are crucial in" rel="nofollow">in understandin" rel="nofollow">ing the fundamental properties of the variables bein" rel="nofollow">ing studied. Exer-cise
27 focuses only on descriptive statistics and will illustrate the most common descrip-tive statistics computed in" rel="nofollow">in nursin" rel="nofollow">ing research and provide examples usin" rel="nofollow">ing actual
clin" rel="nofollow">inical data from empirical publications. MEASURES OF CENTRAL TENDENCY A measure of central tendency is a statistic that represents the center or middle of a
frequency distribution. The three measures of central tendency commonly used in" rel="nofollow">in nursin" rel="nofollow">ing research are the mode, median ( MD ), and mean ( X ). The mean is the
arithmetic average of all of a variable ’ s values. The median is the exact middle value (or the average of the middle two values if there is an even number of
observations). The mode is the most commonly occurrin" rel="nofollow">ing value or values (see Exercise 8 ). The followin" rel="nofollow">ing data have been collected from veterans with rheumatoid
arthritis ( Tran, Hooker, Cipher, &Reimold, 2009 ). The values in" rel="nofollow">in Table 27-1 were extracted from a larger sample of veterans who had a history of biologic medication
use (e.g., in" rel="nofollow">infliximab [Remi-cade], etanercept [Enbrel]). Table 27-1 contain" rel="nofollow">ins data collected from 10 veterans who had stopped takin" rel="nofollow">ing biologic medications, and the
variable represents the number of years that each veteran had taken the medication before stoppin" rel="nofollow">ing. Because the number of study subjects represented below is 10, the
correct statistical notation to reflect that number is: n=10 Note that the n is lowercase, because we are referrin" rel="nofollow">ing to a sample of veterans. If the data bein" rel="nofollow">ing presented
represented the entire population of veterans, the correct notation is the uppercase N. Because most nursin" rel="nofollow">ing research is conducted usin" rel="nofollow">ing samples, not popu-lations, all
formulas in" rel="nofollow">in the subsequent exercises will in" rel="nofollow">incorporate the sample notation, n. Mode The mode is the numerical value or score that occurs with the greatest frequency; it
does not necessarily in" rel="nofollow">indicate the center of the data set. The data in" rel="nofollow">in Table 27-1 contain" rel="nofollow">in two
EXERCISE 27 292EXERCISE 27 • Calculatin" rel="nofollow">ing Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. modes: 1.5 and 3.0. Each of these numbers occurred
twice in" rel="nofollow">in the data set. When two modes exist, the data set is referred to as bimodal ; a data set that contain" rel="nofollow">ins more than two modes would be multimodal . Median The
median ( MD ) is the score at the exact center of the ungrouped frequency distribution. It is the 50th percentile. To obtain" rel="nofollow">in the MD , sort the values from lowest to
highest. If the number of values is an uneven number, exactly 50% of the values are above the MD and 50% are below it. If the number of values is an even number, the
MD is the average of the two middle values. Thus the MD may not be an actual value in" rel="nofollow">in the data set. For example, the data in" rel="nofollow">in Table 27-1 consist of 10 observations, and
therefore the MD is calculated as the average of the two middle values. MD=+()=15202175... Mean The most commonly reported measure of central tendency is the mean. The
mean is the sum of the scores divided by the number of scores bein" rel="nofollow">ing summed. Thus like the MD, the mean may not be a member of the data set. The formula for calculatin" rel="nofollow">ing
the mean is as follows: XXn=∑ where X = mean ∑ = sigma, the statistical symbol for summation X = a sin" rel="nofollow">ingle value in" rel="nofollow">in the sample n = total number of values in" rel="nofollow">in the sample
The mean number of years that the veterans used a biologic medication is calculated as follows: X=+++++++++()=010313151520223030401019...........years TABLE 27-1
DURATION OF BIOLOGIC USE AMONG VETERANS WITH RHEUMATOID ARTHRITIS ( n = 10) Duration of Biologic Use (years) 0.10.31.31.51.52.02.23.03.04.0 Calculatin" rel="nofollow">ing Descriptive
Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights reserved. The mean is an appropriate measure of central tendency for approximately normally
distributed populations with variables measured at the in" rel="nofollow">interval or ratio level. It is also appropriate for ordin" rel="nofollow">inal level data such as Likert scale values, where higher
numbers rep-resent more of the construct bein" rel="nofollow">ing measured and lower numbers represent less of the construct (such as pain" rel="nofollow">in levels, patient satisfaction, depression, and
health status). The mean is sensitive to extreme scores such as outliers. An outlier is a value in" rel="nofollow">in a sample data set that is unusually low or unusually high in" rel="nofollow">in the
context of the rest of the sample data. An example of an outlier in" rel="nofollow">in the data presented in" rel="nofollow">in Table 27-1 might be a value such as 11. The existin" rel="nofollow">ing values range from 0.1 to
4.0, meanin" rel="nofollow">ing that no veteran used a biologic beyond 4 years. If an additional veteran were added to the sample and that person used a biologic for 11 years, the mean
would be much larger: 2.7 years. Simply addin" rel="nofollow">ing this outlier to the sample nearly doubled the mean value. The outlier would also change the frequency distribution.
Without the outlier, the frequency distribution is approximately normal, as shown in" rel="nofollow">in Figure 27-1 . Includin" rel="nofollow">ing the outlier changes the shape of the distribution to
appear positively skewed. Although the use of summary statistics has been the traditional approach to describin" rel="nofollow">ing data or describin" rel="nofollow">ing the characteristics of the sample
before in" rel="nofollow">inferential statistical analysis, its ability to clarify the nature of data is limited. For example, usin" rel="nofollow">ing measures of central tendency, particularly the mean,
to describe the nature of the data obscures the impact of extreme values or deviations in" rel="nofollow">in the data. Thus, significant features in" rel="nofollow">in the data may be concealed or
misrepresented. Often, anomalous, unexpected, or problematic data and discrepant patterns are evident, but are not regarded as meanin" rel="nofollow">ingful. Measures of disper-sion,
such as the range, difference scores, variance, and standard deviation ( SD ), provide important in" rel="nofollow">insight in" rel="nofollow">into the nature of the data. MEASURES OF DISPERSION Measures
of dispersion , or variability, are measures of in" rel="nofollow">individual differences of the members of the population and sample. They in" rel="nofollow">indicate how values in" rel="nofollow">in a sample are dis-persed
around the mean. These measures provide in" rel="nofollow">information about the data that is not available from measures of central tendency. They in" rel="nofollow">indicate how different the scores are
—the extent to which in" rel="nofollow">individual values deviate from one another. If the in" rel="nofollow">individual values are similar, measures of variability are small and the sample is relatively
homogeneous in" rel="nofollow">in terms of those values. Heterogeneity (wide variation in" rel="nofollow">in scores) is important in" rel="nofollow">in some statistical procedures, such as correlation. Heterogeneity is
determin" rel="nofollow">ined by measures of variability. The measures most commonly used are range, difference scores, variance, and SD (see Exercise 9 ). FIGURE 27-1 ■ FREQUENCY
DISTRIBUTION OF YEARS OF BIOLOGIC USE, WITHOUT OUTLIER AND WITH OUTLIER. 0FrequencyFrequency3-3.90-0.92-2.91-1.94-4.93-3.90-.91-1.92-2.94-4.95-5.96-6.97-7.98-8.99-
9.910-10.911-11.9Years of biologic useYears of biologic use3.02.52.01.51.00.503.02.52.01.51.00.5 294EXERCISE 27 • Calculatin" rel="nofollow">ing Descriptive StatisticsCopyright © 2017,
Elsevier Inc. All rights reserved. Range The simplest measure of dispersion is the range . In published studies, range is presented in" rel="nofollow">in two ways: (1) the range is the
lowest and highest scores, or (2) the range is calculated by subtractin" rel="nofollow">ing the lowest score from the highest score. The range for the scores in" rel="nofollow">in Table 27-1 is 0.3 and
4.0, or it can be calculated as follows: 4.0 − 0.3 = 3.7. In this form, the range is a difference score that uses only the two extreme scores for the comparison. The
range is generally reported but is not used in" rel="nofollow">in further analyses. Difference Scores Difference scores are obtain" rel="nofollow">ined by subtractin" rel="nofollow">ing the mean from each score. Sometimes a
difference score is referred to as a deviation score because it in" rel="nofollow">indicates the extent to which a score deviates from the mean. Of course, most variables in" rel="nofollow">in nursin" rel="nofollow">ing
research are not “scores,” yet the term difference score is used to represent a value ’ s deviation from the mean. The difference score is positive when the score is
above the mean, and it is negative when the score is below the mean (see Table 27-2 ). Difference scores are the basis for many statistical analyses and can be found
within" rel="nofollow">in many statistical equations. The formula for difference scores is: XX− Σof absolute values95:. TABLE 27-2 DIFFERENCE SCORES OF DURATION OF BIOLOGIC USE X --X
XX-- 0.1 − 1.9 − 1.80.3 − 1.9 − 1.61.3 − 1.9 − 0.61.5 − 1.9 − 0.41.5 − 1.9 − 0.42.0 − 1.90.12.2 − 1.90.33.0 − 1.91.13.0 − 1.91.14.0 − 1.92.1 The mean deviation is the
average difference score, usin" rel="nofollow">ing the absolute values. The formula for the mean deviation is: XXXndeviation=−∑ In this example, the mean deviation is 0.95. This value
was calculated by takin" rel="nofollow">ing the sum of the absolute value of each difference score (1.8, 1.6, 0.6, 0.4, 0.4, 0.1, 0.3, 1.1, 1.1, 2.1) and dividin" rel="nofollow">ing by 10. The result
in" rel="nofollow">indicates that, on average, subjects ’ duration of biologic use deviated from the mean by 0.95 years. Variance Variance is another measure commonly used in" rel="nofollow">in statistical
analysis. The equation for a sample variance ( s 2 ) is below. sXXn221=−()−∑ Calculatin" rel="nofollow">ing Descriptive Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights
reserved. Note that the lowercase letter s 2 is used to represent a sample variance. The lowercase Greek sigma ( σ 2 ) is used to represent a population variance, in" rel="nofollow">in
which the denomin" rel="nofollow">inator is N in" rel="nofollow">instead of n − 1. Because most nursin" rel="nofollow">ing research is conducted usin" rel="nofollow">ing samples, not popu-lations, formulas in" rel="nofollow">in the subsequent exercises that
contain" rel="nofollow">in a variance or standard deviation will in" rel="nofollow">incorporate the sample notation, usin" rel="nofollow">ing n − 1 as the denomin" rel="nofollow">inator. Moreover, statistical software packages compute the
variance and standard deviation usin" rel="nofollow">ing the sample formu-las, not the population formulas. The variance is always a positive value and has no upper limit. In general,
the larger the variance, the larger the dispersion of scores. The variance is most often computed to derive the standard deviation because, unlike the variance, the
standard deviation reflectsimpor-tant properties about the frequency distribution of the variable it represents. Table 27-3 displays how we would compute a variance by
hand, usin" rel="nofollow">ing the biologic duration data. s213419=. s²=1.49 TABLE 27-3 VARIANCE COMPUTATION OF BIOLOGIC USE X X XX-- XX--(())2 0.1 − 1.9 − 1.83.240.3 − 1.9 − 1.62.561.3
− 1.9 − 0.60.361.5 − 1.9 − 0.40.161.5 − 1.9 − 0.40.162.0 − 1.90.10.012.2 − 1.90.30.093.0 − 1.91.11.213.0 − 1.91.11.214.0 − 1.92.14.41 Σ 13.41 Standard Deviation
Standard deviation is a measure of dispersion that is the square root of the variance. The standard deviation is represented by the notation s or SD . The equation for
obtain" rel="nofollow">inin" rel="nofollow">ing a standard deviation is SDX=−()−∑Xn21 Table 27-3 displays the computations for the variance. To compute the SD , simply take the square root of the variance.
We know that the variance of biologic duration is s 2 = 1.49. Therefore, the s of biologic duration is SD = 1.22. The SD is an important sta-tistic, both for
understandin" rel="nofollow">ing dispersion within" rel="nofollow">in a distribution and for in" rel="nofollow">interpretin" rel="nofollow">ing the relationship of a particular value to the distribution. SAMPLING ERROR A standard error
describes the extent of samplin" rel="nofollow">ing error. For example, a standard error of the mean is calculated to determin" rel="nofollow">ine the magnitude of the variability associated with the mean.
A small standard error is an in" rel="nofollow">indication that the sam 296EXERCISE 27 • Calculatin" rel="nofollow">ing Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. the
population mean, while a large standard error yields less certain" rel="nofollow">inty that the sample mean approximates the population mean. The formula for the standard error of the
mean ( sX ) is: ssnX= Usin" rel="nofollow">ing the biologic medication duration data, we know that the standard deviation of biologic duration is s = 1.22. Therefore, the standard error
of the mean for biologic dura-tion is computed as follows: sX=12210. sX=039. The standard error of the mean for biologic duration is 0.39. Confidence Intervals To
determin" rel="nofollow">ine how closely the sample mean approximates the population mean, the stan-dard error of the mean is used to build a confidence in" rel="nofollow">interval. For that matter, a
confidence in" rel="nofollow">interval can be created for many statistics, such as a mean, proportion, and odds ratio. To build a confidence in" rel="nofollow">interval around a statistic, you must have the
standard error value and the t value to adjust the standard error. The degrees of freedom ( df ) to use to compute a confidence in" rel="nofollow">interval is df = n − 1. To compute the
confidence in" rel="nofollow">interval for a mean, the lower and upper limits of that in" rel="nofollow">interval are created by multiplyin" rel="nofollow">ing the sX by the t statistic, where df = n − 1. For a 95% confidence
in" rel="nofollow">interval, the t value should be selected at α = 0.05. For a 99% confidence in" rel="nofollow">inter-val, the t value should be selected at α = 0.01. Usin" rel="nofollow">ing the biologic medication duration
data, we know that the standard error of the mean duration of biologic medication use is sX=039. . The mean duration of biologic medication use is 1.89. Therefore, the
95% confidence in" rel="nofollow">interval for the mean duration of biologic medication use is computed as follows: XstX± 189039226...±()() 189088..± As referenced in" rel="nofollow">in Appendix A , the t
value required for the 95% confidence in" rel="nofollow">interval with df = 9 is 2.26. The computation above results in" rel="nofollow">in a lower limit of 1.01 and an upper limit of 2.77. This means that
our confidence in" rel="nofollow">interval of 1.01 to 2.77 estimates the population mean duration of biologic use with 95% confidence( Klin" rel="nofollow">ine, 2004 ). Technically and math-ematically, it
means that if we computed the mean duration of biologic medication use on an in" rel="nofollow">infinite number of veterans, exactly 95% of the in" rel="nofollow">intervals would contain" rel="nofollow">in the true population
mean, and 5% would not contain" rel="nofollow">in the population mean ( Glin" rel="nofollow">iner, Morgan, & Leech, 2009 ). If we were to compute a 99% confidence in" rel="nofollow">interval, we would require the t value that
is referenced at α = 0.01. Therefore, the 99% confidence in" rel="nofollow">interval for the mean duration of biologic medication use is computed as follows: 189039325...±()() 189127..±
Calculatin" rel="nofollow">ing Descriptive Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights reserved. As referenced in" rel="nofollow">in Appendix A , the t value required for the 99%
confidence in" rel="nofollow">interval with df = 9 is 3.25. The computation above results in" rel="nofollow">in a lower limit of 0.62 and an upper limit of 3.16. This means that our confidence in" rel="nofollow">interval of
0.62 to 3.16 estimates the population mean duration of biologic use with 99% confidence. Degrees of Freedom The concept of degrees of freedom ( df ) was used in" rel="nofollow">in
reference to computin" rel="nofollow">ing a confidence in" rel="nofollow">interval. For any statistical computation, degrees of freedom are the number of in" rel="nofollow">inde-pendent pieces of in" rel="nofollow">information that are free to
vary in" rel="nofollow">in order to estimate another piece of in" rel="nofollow">information ( Zar, 2010 ). In the case of the confidence in" rel="nofollow">interval, the degrees of freedom are n − 1. This means that there
are n − 1 in" rel="nofollow">independent observations in" rel="nofollow">in the sample that are free to vary (to be any value) to estimate the lower and upper limits of the confidence in" rel="nofollow">interval. SPSS
COMPUTATIONS A retrospective descriptive study examin" rel="nofollow">ined the duration of biologic use from veterans with rheumatoid arthritis ( Tran et al., 2009 ). The values in" rel="nofollow">in Table
27-4 were extracted from a larger sample of veterans who had a history of biologic medication use (e.g., in" rel="nofollow">infliximab [Remicade], etanercept [Enbrel]). Table 27-4
contain" rel="nofollow">ins simulated demographic data col-lected from 10 veterans who had stopped takin" rel="nofollow">ing biologic medications. Age at study enroll-ment, duration of biologic use,
race/ethnicity, gender (F = female), tobacco use (F = former use, C = current use, N = never used), primary diagnosis (3 = irritable bowel syndrome, 4 = psoriatic
arthritis, 5 = rheumatoid arthritis, 6 = reactive arthritis), and type of biologic medication used were among the study variables examin" rel="nofollow">ined. TABLE 27-4 DEMOGRAPHIC
VARIABLES OF VETERANS WITH RHEUMATOID ARTHRITIS Patient ID Duration (yrs) Age Race/Ethnicity Gender Tobacco Diagnosis Biologic 10.142CaucasianFF5Infl iximab20.341Black,
not of Hispanic Origin" rel="nofollow">inFF5Etanercept31.356CaucasianFN5Infl iximab41.578CaucasianFF3Infl iximab51.586Black, not of Hispanic
Origin" rel="nofollow">inFF4Etanercept62.049CaucasianFF6Etanercept72.282CaucasianFF5Infl iximab83.035CaucasianFN3Infl iximab93.059Black, not of Hispanic Origin" rel="nofollow">inFC3Infl
iximab104.037CaucasianFF 298EXERCISE 27 • Calculatin" rel="nofollow">ing Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. This is how our data set looks in" rel="nofollow">in
SPSS. Step 1: For a nomin" rel="nofollow">inal variable, the appropriate descriptive statistics are frequencies and percentages. From the “Analyze” menu, choose “Descriptive Statistics”
and “Frequen-cies.” Move “Race/Ethnicity and Gender” over to the right. C Calculatin" rel="nofollow">ing Descriptive Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights
reserved. Step 2: For a contin" rel="nofollow">inuous variable, the appropriate descriptive statistics are means and standard deviations. From the “Analyze” menu, choose “Descriptive
Statistics” and “Explore.” Move “Duration” over to the right. Click “OK.” INTERPRETATION OF SPSS OUTPUT The followin" rel="nofollow">ing tables are generated from SPSS. The fi rst set of
tables (from the fi rst set of SPSS commands in" rel="nofollow">in Step 1) contain" rel="nofollow">ins the frequencies of race/ethnicity and gender. Most (70%) were Caucasian, and 100% were female.
Frequencies Frequency Table RaceEthnicityFrequencyPercentValidPercentCumulativePercentValidBlack, not of Hispanic
Origin" rel="nofollow">in330.030.030.0Caucasian770.070.0100.0Total10100.0100.0GenderFrequencyPercentValid PercentCumulative PercentValidF10100.0100.0 300EXERCISE 27 • Calculatin" rel="nofollow">ing
Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. DescriptivesStatisticStd. ErrorDuration of Biologic Use1.890.3860Lower Bound1.017Upper
Bound2.7631.8721.7501.4901.2206.14.03.92.0.159.687-.4371.334Mean95% Confidence Interval for Mean 5% Trimmed MeanMedianVarianceStd.
DeviationMin" rel="nofollow">inimumMaximumRangeInterquartileRangeSkewnessKurtosis The second set of output (from the second set of SPSS commands in" rel="nofollow">in Step 2) contain" rel="nofollow">ins the descriptive
statistics for “Duration,” in" rel="nofollow">includin" rel="nofollow">ing the mean, s (standard deviation), SE , 95% confidence in" rel="nofollow">interval for the mean, median, variance, min" rel="nofollow">inimum value, maximum value, range,
and skewness and kurtosis statistics. As shown in" rel="nofollow">in the output, mean number of years for duration is 1.89, and the SD is 1.22. The 95% CI is 1.02–2.76. Explore
Calculatin" rel="nofollow">ing Descriptive Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights reserved. STUDY QUESTIONS 1. Defi ne mean. 2. What does this symbol, s 2 ,
represent? 3. Defi ne outlier. 4. Are there any outliers among the values representin" rel="nofollow">ing duration of biologic use? 5. How would you in" rel="nofollow">interpret the 95% confidence in" rel="nofollow">interval
for the mean of duration of biologic use? 6. What percentage of patients were Black, not of Hispanic origin" rel="nofollow">in? 7. Can you compute the variance for duration of biologic
use by usin" rel="nofollow">ing the in" rel="nofollow">information presented in" rel="nofollow">in the SPSS output above? 302EXERCISE 27 • Calculatin" rel="nofollow">ing Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights
reserved. 8. Plot the frequency distribution of duration of biologic use. 9. Where is the median in" rel="nofollow">in relation to the mean in" rel="nofollow">in the frequency distribution of duration of
biologic use? 10. When would a median be more in" rel="nofollow">informative than a mean in" rel="nofollow">in describin" rel="nofollow">ing Copyright © 2017, Elsevier Inc. All rights reserved. 303 Answers to Study Questions
Duration of biologic useMean = 1.89Std. Dev. = 1.221N = 10321001.02.03.04.05.0Frequency 1. The mean is defined as the arithmetic average of a set of numbers. 2. s 2
represents the sample variance of a given variable. 3. An outlier is a value in" rel="nofollow">in a sample data set that is unusually low or unusually high in" rel="nofollow">in the context of the rest of
the sample data. 4. There are no outliers among the values representin" rel="nofollow">ing duration of biologic use. 5. The 95% CI is 1.02–2.76, meanin" rel="nofollow">ing that our confidence in" rel="nofollow">interval of
1.02–2.76 estimates the population mean duration of biologic use with 95% confidence. 6. 30% of patients were Black, not of Hispanic origin" rel="nofollow">in. 7. Yes, the variance for
duration of biologic use can be computed by squarin" rel="nofollow">ing the SD presented in" rel="nofollow">in the SPSS table. The SD is listed as 1.22, and, therefore, the variance is 1.22 2 or 1.49. 8.
The frequency distribution approximates the followin" rel="nofollow">ing plot: 9. The median is 1.75 and the mean is 1.89. Therefore, the median is lower in" rel="nofollow">in relation to the mean in" rel="nofollow">in the
frequency distribution of duration of biologic use. 10. A median can be more in" rel="nofollow">informative than a mean in" rel="nofollow">in describin" rel="nofollow">ing a variable when the variable ’ s frequency
distribution is positively or negatively skewed. While the mean is sensitive to outli-ers, the median is relatively unaffected. Copyright © 2017, Elsevier Inc. All
rights reserved. 305 Questions to Be Graded EXERCISE 27 Follow your in" rel="nofollow">instructor ’ s directions to submit your answers to the followin" rel="nofollow">ing questions for gradin" rel="nofollow">ing. Your
in" rel="nofollow">instructor may ask you to write your answers below and submit them as a hard copy for gradin" rel="nofollow">ing. Alternatively, your in" rel="nofollow">instructor may ask you to use the space below for
notes and submit your answers onlin" rel="nofollow">ine at http://evolve.elsevier.com/Grove/statistics/ under “Questions to Be Graded.” 1. What is the mean age of the sample data? 2.
What percentage of patients never used tobacco? 3. What is the standard deviation for age? 4. Are there outliers among the values of age? Provide a rationale for your
answer. 5. What is the range of age values? Name: _______________________________________________________ Class: _____________________ Date:
________________________________________________________________ 306EXERCISE 27 • Calculatin" rel="nofollow">ing Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights
reserved. 6. What percentage of patients were takin" rel="nofollow">ing in" rel="nofollow">infliximab? 7. What percentage of patients had rheumatoid arthritis as their primary diagnosis? 8. What percentage
of patients had irritable bowel syndrome as their primary diagnosis? 9. What is the 95% CI for age? 10. What percentage of patients had psoriatic arthritis as their
primary di Calculatin" rel="nofollow">ing Pearson Product-Moment Correlation Coefficient Correlational analyses identify associations between two variables. There are many differ-ent
kin" rel="nofollow">inds of statistics that yield a measure of correlation. All of these statistics address a research question or hypothesis that in" rel="nofollow">involves an association or
relationship. Examples of research questions that are answered with correlation statistics are, “Is there an associa-tion between weight loss and depression?” and “Is
there a relationship between patient satisfaction and health status?” A hypothesis is developed to identify the nature (positive or negative) of the relationship
between the variables bein" rel="nofollow">ing studied. The Pearson product-moment correlation was the fi rst of the correlation measures developed and is the most commonly used. As is
explain" rel="nofollow">ined in" rel="nofollow">in Exercise 13 , this coefficient (statistic) is represented by the letter r , and the value of r is always between − 1.00 and + 1.00. A value of zero
in" rel="nofollow">indicates no relationship between the two variables. A positive cor-relation in" rel="nofollow">indicates that higher values of x are associated with higher values of y . A negative or
in" rel="nofollow">inverse correlation in" rel="nofollow">indicates that higher values of x are associated with lower values of y . The r value is in" rel="nofollow">indicative of the slope of the lin" rel="nofollow">ine (called a regression
lin" rel="nofollow">ine) that can be drawn through a standard scatterplot of the two variables (see Exercise 11 ). The strengths of different relationships are identified in" rel="nofollow">in Table 28-1 (
Cohen,