Social media analytics to gain insight

Think about a case study, in which you can apply social media analytics to gain insight about how a certain artist or band can improve their popularity. The assignment consists of 3 parts. In the first part, you will need to describe the setting for your case study. In the second part, you will need to apply social media analytics, using the tools introduced during the labs. In the third part, you will need to evaluate your findings and determine appropriate future actions. You are required to use software for analysis and produce a written report. The report accounts for the majority of points for each question. Simply pasting screenshots of your analysis outputs will not give you full points. Accuracy and reproducibility of your code will be checked. Choose data sources and data that are appropriate for your case study. Pay attention to how much data you retrieve and how frequently you retrieve data. If you try to get lots of data often, the APIs will impose a rate-limit on your account. However, you can still proceed after the rate-limit has ended. Use the software introduced in the labs (RStudio, Tableau…). Add headings in your R scripts so that we can easily find the code related to each question. Export all datasets as .RData files (so that we can re-run your code if needed). Add screenshots of your results. Make plots/visualisations wherever possible, using R functions, Gephi, and Tableau. (Most results can be displayed as a plot/visualisation!) Part 1 - Case Study Setting Choose a well-known artist or band. Assume you are the artist’s/band’s manager and want to help improve their popularity by using social media analytics. Your chosen artist/band should be well-known already so that there exists enough social media data that is somehow related to it. Otherwise, you may not be able to retrieve enough useful data for performing the analytical steps later. 1) Describe the artist/band you are managing by using data from the Spotify API. Add additional information from other sources (e.g., Wikipedia). How many years have they been active? How many albums & songs have they published? What are the prevalent features of their songs (e.g., valence)? [1-2 paragraphs, 1.5 points] 2) Describe the purpose of using social media analytics for your case study. For example: How do you want to improve the popularity? How can social media analytics help you achieve it? What kind of social media data do you want to analyse? What is your hypothesis (expectation) about the analysis outcome? [2-3 paragraphs, 1.5 points] Part 2 - Social Media Analytics [23 points] Perform data selection & exploration, data pre-processing, data analysis, and visualisation for your case study. Part 2 A - Data Selection & Exploration 1) Select social media platforms (Twitter, YouTube, Spotify) and retrieve data. Make sure to choose keywords for data retrieval that are most relevant to your artist/band. However, try not to be too narrow. As a general guide, you should retrieve at least 1000 data items (e.g., tweets from Twitter or comments for YouTube videos). Explain what you have done. 2) List the top 5 most influential users for your artist/band. Find out what other interests/characteristics they have besides those related to your artist/band. Do these 5 have something in common? ] 3) List the top 10 most important terms that appear together with your keyword(s) related to your artist/band. Explain the results. 4) For your Twitter dataset, calculate how many of your retrieved tweets are retweets. Alternatively, if you filtered out retweets in your query, calculate how many unique user accounts there are in your dataset. What do the results tell you? 5) In your YouTube dataset, which videos have the highest number of views and likes? Do you see a correlation between views and likes? (Your dataset may contain hundreds of videos, so it’s OK if you choose only a subset of those to get their statistics, in order to avoid hitting the rate-limit. However, you should get statistics for at least 20 videos.) 6) Find related artists/bands on Spotify and create a network graph. Did you find any interesting relationships? (Maximum 5 points for Data Selection & Exploration) Part 2 B - Text Pre-Processing Conduct the following for cleaning textual data in your datasets. 1) Perform text cleaning and remove punctuation and symbols as much as appropriate. [0.5 point] 2) Remove stop words and perform stemming. 3) Create a Term-Document Matrix. What are the 10 terms occurring with the highest frequency? How are they different to your answer for 2 A 3) above? (Maximum 2 points for Pre-Processing) Part 2 C - Social Network Analysis 1) Perform community analysis with the infomap, Girvan-Newman (edge betweenness) and Louvain methods. Explain how relevant the results are to your artist/band. Perform the community analysis also for related artists. Is their community structure similar? 2) Perform centrality analysis by detecting degree centrality, betweenness centrality, and closeness centrality. Explain how relevant the results are to your artist/band. What are the actual degree, betweenness, and centrality scores for your artist/band node in the network? Compare these scores to the scores for related artists. (Maximum 6 points for SNA) Part 2 D - Machine Learning Models 1) Use k-means clustering to classify a user’s friends (following) and followers. You have to identify one influential user related to your artist/band and analyse his/her friends and followers. Justify why he/she is an influential user. Explain the results. [1.5 points] 2) Build a decision tree and evaluate its performance in predicting whether a song is by your artist/band. 3) Use sentiment analysis to identify how the public reacts to events and/or topics related to your artist/band. Provide a summary of public opinions (emotions, reactions). 4) Use LDA topic modelling to identify some terms that are closely related to your artist/band. Find at least 3 significant groups of words that can be meaningful to your analysis. Explain your findings. (Maximum 7 points for Machine Learning Models) Part 2 E - Visualisation 1) Plot the location of tweets from your Twitter dataset on a map using RStudio. Explain your findings. (If you do not have enough tweets with location data in your Twitter dataset that you have been using for the previous questions, you can run a new search to get a dataset with more location data.) 2) Plot the location of tweets from your Twitter dataset on a map using Tableau. Add other attributes as additional marks to your map (e.g., as colour, size). Explain your choice of attributes and visual marks. 3) Create at least two other charts from your datasets using Tableau and combine them together with your plot from the previous question into a dashboard. Explain the functionality of your dashboard. Part 3 - Evaluation 1) What are the findings of your social media analytics? [2-4 paragraphs, 2 points] 2) What actions for improving the popularity of your artist/band do you suggest based on your findings? 3) How could you refine your social media analytics? For example: Could you use different data sources? Could you choose different parameters? Can you think of ways to obtain more relevant data?                                                                                                                                                                                                                                                                                                                                        

Sample Solution