Quantitative Methods for Business

You are currently working for a marketing company that has a demographic dataset pertaining to a set of
audience analytics. However, with this dataset, you receive a document containing information about all of the
cities that this marketing company operates within. Therefore before you can begin your analysis, you first need to
collect and process your data in order to prepare your dataset to be joined with demographic data and provide a
proper analysis for your manager. Additionally, you want to follow best practices so that your colleagues can reuse
your city dataset and possibly contribute in the future.
Data
City: Boston
State: MA
Latitude: 42.3188
Longitude: -71.0846
Population: 4,637,537
Input Date: 2020-02-01
City: Houston
State: TX
Latitude: 29.7869
Longitude: -95.3905
Population: 5446468
Input Date: 2020/02/01
City: Dallas
State: T.X.
Latitude: 32.79
Longitude: -96.7662
Population:
Input Date: 02/01/2020
City: San Francisco
State: CA
Latitude: 37.75
Longitude: -122.443
Population: 3603761
Input Date: 2020-02-01
City: Los Angeles
State: california
Latitude: 34.1139
Longitude: -118.4068
Population: -
Input Date: 2020-02-01
City: Miami
State: FL
Longitude: -80.2102
Population: 6381966
Input Date: 2020-02-01
City: Manhattan
State: ny
Latitude: 40.7834
Longitude: -73.96
Population: 1643734.00
Input Date: 2020-02-01
Assignment
Perform the following steps using google sheets and provide a write up within a google doc.

  1. Create a raw data set from the above data.
  2. Create a tall and wide dataset from your raw data.
    a. Explain the benefits and tradeoffs as it pertains to your data.
    b. Move forward with your tall dataset
  3. Define your values, variables and observations and provide reasoning as to why you have made these
    decisions.
  4. Perform the following data cleaning analysis on your data set. Provide your findings (there may be no
    findings), and what you will do to correct it.
    a. Validity checks
    i. Data Types
    ii. Ranges
    iii. Missing
    iv. Unique
    v. Membership
    vi. Regex
    b. Completeness
    c. Uniformity
  5. Create a data dictionary.
  6. Correct your curated data set to ensure a valid set of data.
  7. Write a README.
    Be sure to follow the best practices below (as outlined in lecture) when developing your solutions:
  8. Consistency
  9. Named ranges
  10. Organization
  11. Naming conventions (files and variables)
  12. Dates
  13. Missing data
  14. Formatting
  15. Index column

Sample Solution