Detection and treatment of outliers.
Develop a better understanding of the
HBAT dataset more specifically to explore the characteristics of its customers and the
relationship between their perception of HBAT and their actions towards HBAT.
Make sure to address all the following questions.
- [2.5 points]. Run a thorough univariate and bivariate graphical and statistical
examination of your data. Do you notice any irregularities in your data? How does your
data look like? Normal, skewed?
Tips. Make sure to number and label each table and graph. (e.g., Table 1. Summary
statistics for HBA missing data set).
Provide a title and a detailed interpretation of each chart.
Remember the rule of thumb: for any table and/or graph, you can have at most 3
genuine findings. If you have more, you have probably made them up (����). - [2.0 points]. Missing values analysis. Do you have any missing values in your data? If
yes, determine the extent of missing values per variable and case. Are there any
variables/cases that you need to delete? Use ≥ 30% of missing values as the threshold
for deletion.
a. After deleting variables and/or cases with ≥ 30% missing data, construct a
summary statistic for your data. Do you still observe any missing values? If yes,
decide on how to impute these missing values. Limit your imputation
technique to mean or median substitution. Justify your choice. - [2.0 points]. Detection and treatment of outliers.
Math and Statistics for Analytics Nov 2020
3
Are there any univariate outliers in your dataset? Use both the Tukey´s fences and z
score approach (with z threshold set at 2.5 since you have a small sample size) to
identify them. Do you notice any discrepancy between the two methods? Explain.
How many values were detected as outliers? Will you keep these outliers or delete
them? Justify your decision. Discuss impact of your decision on remaining data
analysis. - [2.5 points]. After treating missing values and outliers, construct a summary statistic of
your data. Compare and contrast your results with question 1. Develop two
hypothetical questions that you can answer using graphical and/or empirical
techniques. Provide correct answers. - [1.0 point]. In the case that you did not treat missing values and/or outliers, what
would be the impact on subsequent data analysis.
Dataset: HBAT Industries Dataset – HBAT missing. (You can access the dataset in Additional
Documentation / Assignment 1)
Context: HBAT is a manufacturer of paper products. Hypothetical dataset based on surveys of
HBAT customers completed on a secure web site managed by an established marketing
research company.
Sample size: 70 observations on 14 separate variables based on a market segmentation study
of HBAT customers: newsprints industry and the magazine industry.
Categories of data:
• Numerical variables: V1 to V9.
• Categorical variable: V10 to V14.
Additional information related to the variables is available in the excel file (HBAT missing) in
the Metadata spreadsheet.
Pre-requisite: before working on this assignment, you need to watch the series of videos: Data
examination – Excel (Campus Online / Additional Documentation / Module 2 / Recorded
Videos)
For detecting outliers, you are already familiar with the boxplot method as well as the Tukey´s
fences (video), you can also use the z score approach, to calculate the z value for each
observation, you can use Excel built-in function (STANDARDIZE).
For missing value analysis, you can use the COUNT function to count the numbers of cells
containing data in a range that contains numbers and use it to determine the extent of missing
values per case and per variable.
Additional resources:
Introduction to data analysis in Excel: https://www.youtube.com/watch?v=Rs4082ewxgA
Introduction to Pivot tables: https://www.youtube.com/watch?v=9NUjHBNWe9M
Sample Solution
The United States is home to probably the most infamous and productive chronic executioners ever. Names, for example, Ted Bundy, Gary Ridgeway, and the Zodiac Killer have become easily recognized names because of the awful idea of their wrongdoings. Perhaps the most productive chronic executioners in American history is John Wayne Gacy. Nicknamed the Killer Clown in light of his calling, Gacy assaulted and killed at any rate 33 adolescent young men and young fellows somewhere in the range of 1972 and 1978, which is one of the most noteworthy realized casualty tallies. Gacy’s story has become so notable that his violations have been included in mainstream society and TV shows, for example, American Horror Story: Hotel and Criminal Minds. Legal science has, and proceeds to, assume a significant part in the settling of the case and ID of the people in question. John Wayne Gacy’s set of experiences of sexual and psychological mistreatment was instrumental in provoking specialist’s curiosity of him as a suspect. John Wayne Gacy was brought into the world on March 17, 1942, in Chicago, Illinois. Being the solitary child out of three kids, Gacy had a stressed relationship with his dad, who drank intensely and was regularly damaging towards the whole family (Sullivan and Maiken 48). In 1949, a project worker, who was a family companion, would stroke Gacy during rides in his truck; be that as it may, Gacy never uncovered these experiences to his folks because of a paranoid fear of revenge from his dad (Foreman 54). His dad’s mental maltreatment proceeded into his young grown-up years, and Gacy moved to Las Vegas where he worked momentarily in the rescue vehicle administration prior to turning into a funeral home orderly (Sullivan and Maiken 50). As a funeral home orderly, Gacy was vigorously associated with the treating cycle and conceded that one night, he moved into the casket of a perished high school kid and stroked the body (Cahill and Ewing 46). Stunned at himself, Gacy re-visitations of Chicago to live with his family and graduates from Northwestern Business College in 1963, and acknowledges an administration student position with Nunn-Bush Shoe Company. In 1964, Gacy is moved to Springfield and meets his future spouse, Marlynn Myers. In Springfield, Gacy has his subsequent gay experience when a collaborator shakily performed oral sex on him (London 11:7). Gacy moves to Waterloo, Iowa, and starts a family with Myers. Nonetheless, after consistently undermining his significant other with whores, Gacy submits his previously known rape in 1967 upon Donald Vorhees. In the coming months, Gacy explicitly mishandles a few different adolescents and is captured and accused of oral homosexuality (Sullivan and Maiken 60). On December 3, 1968, Gacy is indicted and condemned to ten years at the Anamosa State Penitentiary. Gacy turns into a model detainee at Anamosa and is allowed parole in June of 1970, an only a short time after his condemning. He had to migrate to Chicago and live with his mom and notice a 10:00PM time limit. Not exactly a year later, Gacy is accused again of explicitly attacking an adolescent kid yet the young didn’t show up in court, so the charges were dropped. Gacy was known by numerous individuals locally to be a devoted volunteer and being dynamic in local area legislative issues. His part as “Pogo the Clown” the comedian started in 1975 when Gacy joined a nearby “Carefree Joker” jokester club that consistently performed at raising support occasions. On January 3, 1972, Gacy submits his first homicide of Timothy McCoy, a 16-year old kid venturing out from Michigan to Omaha. Guaranteeing that McCoy went into his room employing a kitchen blade, Gacy gets into an actual squabble with McCoy prior to cutting him over and again in the chest. Subsequent to understanding that McCoy had absentmindedly strolled into the stay with the blade while attempting to get ready breakfast, Gacy covers the body in his unfinished plumbing space. Gacy conceded in the meetings following his capture that slaughtering McCoy gave him a “mind-desensitizing climax”, expressing that this homicide was the point at which he “understood passing was a definitive rush” (Cahill and Ewing 349). Right around 2 years after the fact, Gacy submits his second homicide of a unidentified teen. Gacy choked the kid prior to stuffing the body in his wardrobe prior to covering him (Cahill 349). In 1975, Gacy’s business was developing rapidly and his hunger for young fellows developed with it. Gacy frequently baited youngsters under his work to his home, persuading them to place themselves in cuffs, and assaulting and tormenting them prior to choking them (Cahill 169-170). The greater part of Gacy’s killings occurred somewhere in the range of 1976 and 1978, the first of this time occurring in April 1976. A large number of the adolescents that were killed during this time were covered in an unfinished plumbing space under Gacy’s home. For the rest of the homicides, Gacy confessed to losing five bodies the I-55 scaffold into the Des Plaines River; notwithstanding, just four of the bodies were ever recuperated (Linedecker 152). In December 1978, Gacy meets Robert Jerome Piest, a 15-year old kid working at a drug store and offers him a work at Gacy’s firm. Piest advises his mom regarding this and neglects to restore that night. The Piest family documents a missing individual’s report and the drug specialist educates police that Gacy would undoubtedly be the man that Jerome addressed about a work. When addressed by the police, Gacy denied any inclusion in Piest’s vanishing. Nonetheless, the police were not persuaded, and Gacy’s set of experiences of sexual maltreatment and battery incited the police to look through his home. Among the things found at Gacy’s home were a 1975 secondary school class ring with the initials J.A.S., numerous driver’s licenses, binds, apparel that was excessively little for Gacy, and a receipt for the drug store that Piest had worked at. Throughout the span of the following not many days, specialists got various calls and tips about Gacy’s rapes and the baffling vanishings of Gacy’s representatives. The class ring was in the long run followed back to John A. Szyc, one of Gacy’s casualties in 1977. Futhermore, after looking at Gacy’s vehicle, agents found a little bunch of strands taking after human hair, which were shipped off the labs for additional examination. That very night, search canines were utilized to identify any hint of Piest in Gacy’s vehicle, and one of the canines demonstrated that Piest had, truth be told, been available in the vehicle. On December 20, 1977, under the pressure of consistent police observation and examination, Gacy admits to more than 30 killings and educates his legal advisor and companion where the bodies were covered, both in the unfinished plumbing space and the stream. 26 casualties were found in the unfinished plumbing space and 4 in the waterway. Gacy is captured, indicted for 33 homicides, and condemned to death by deadly infusion. He endeavored a madness request yet was denied, and was executed on May 10, 1994. There were a few scientific pointers that agents used to attach Gacy to the killings. A portion of these include fiber investigation, dental and radiology records, utilizing the deterioration cycle of the human body, and facial recreation in distinguishing the people in question. Examiners discovered strands that looked like human hair in both Gacy’s vehicle and close to the unfinished plumbing space where the bodies were covered. Notwithstanding these hair tests, agents likewise discovered strands that contained hints of Gacy’s blood and semen in a similar territory. Blood having a place with the casualties was found on a portion of the filaments, which would later straightforwardly attach Gacy to the violations. The filaments in Gacy’s vehicle were dissected by measurable researchers and coordinated Piest’s hair tests. Besides, the pursuit canines that confirmed that Piest had been in Gacy’s vehicle demonstrated this by a “demise response”, which told agents that Piest’s dead body had been within Gacy’s vehicle. Out of Gacy’s 33 known casualties, just 25 were ever indisputably distinguished. A significant number of Gacy’s casualties had comparative actual portrayals and were consequently difficult to distinguish by simply asking people in general. To recognize the people in question, agents went to Betty Pat Gatliff, a pioneer in criminological science and facial recreation. Facial remaking is the way toward reproducing the facial highlights of a person by utilizing their remaining parts. Certain facial highlights, for example, facial structures, nasal design, and generally face shape can be helpful in distinguishing a casualty even long in the afterlife. By utilizing these highlights, and with the assistance of program, scientific specialists can make a picture of an individual’s face, which is instrumental in recognizing casualties after their bodies have rotted. Facial recreation should be possible in a few measurements. Two-dimensional facial recreations is utilized with skull radiographs and depend on pre-demise photos and data. Notwithstanding, this isn’t really ideal in light of the fact that cranial highlights are not generally noticeable or at the correct scale (Downing). To get a reasonable and more exact portrayal of the casualty’s face, a craftsman and a measurable anthropologist are typically essential (Downing). Three-dimensional facial reproduction is finished by figures or high goal, three-dimensional pictures. PC programs can make facial reproductions by controlling examined photos of the remaining parts and use approximations to reproduce facial highlights. These will in general create results that don’t look fake (Reichs and Craig 491). Some of the time, agents will utilize a strategy called superimposition as a method for facial recreation. Sadly, it’s anything but an ordinarily utilized technique, as it expects agents to have some information about the personality of the remaining parts they are managing. By superimposing a photo of a person over the skeletal remaining parts, examiners can check whether the facial highlights line up with the anatomical highlights, permitting them to recognize a casualty. On account of John Wayne Gacy’s casualties, specialists had the option to utilize facial reproduction to distinguish nine of the bodies found in the unfinished plumbing space. The accompanying realistic shows the facial reproductions of these nine casualties: Since facial reproduction was sufficiently not to distinguish the entirety of the v>
GET ANSWER