Thematic Exploration of Youtube Data: A Methodology for Discovering Latent Topics

Posted: 29 Aug 2017

Date Written: September 8, 2017

Abstract

An automated research method that uses the topic modeling algorithm called Latent Dirichlet Allocation (LDA) to discover latent topics and explore potential themes in YouTube transcript data.

In a study published in 2015 by Ganesan, Brantley, Pan, and Chen (Ganesan, Brantley, Pan, & Chen, 2015) researchers recognized that there is a problem with the search process when trying to visualize the correlation between a large collections of documents and a given set of topics. Chaney and Blei emphasize in a 2012 study the importance of science, industry, and culture to have the ability to explore the hidden structures found within large collections of unorganized documents (Chaney & Blei, 2012). Wang and Blei published an article in 2011 citing the difficulty of finding and recommending relevant scientific research papers to communities of researchers (Wang & Blei, 2011). Finally, an article written by Roberts, Stewart, and Airoldi in 2016 discusses the popularity of statistical models and how they are used for exploring large collections of documents to measure latent linguistic, political, and psychological variables in the social sciences (Roberts, Stewart, & Airoldi, 2016).

All of these studies can be aggregated together to describe the problem that continues to challenge researchers and information management practitioners whom are attempting to explore a latent set of topics which form a common theme within a large collection of documents. Documents are being collected from a growing number of sources that continue to offer the problem of complexity and lack of intuitive correlation. Novel research methods that include a mixture of technologies working together as a framework are needed to address these challenges found within a large collection of documents. This research method is proposing proposes a framework that can be used by researchers and practitioners to discover latent topics found within a target set of YouTube video transcript documents.

Keywords: Latent Dirichlet Allocation, LDA, Research Method, Text Mining, Topic Modeling, YouTube Transcript Data

Suggested Citation

Daniel, Clinton, Thematic Exploration of Youtube Data: A Methodology for Discovering Latent Topics (September 8, 2017). Seventh International Engaged Management Scholarship Conference, Muma Business Review, 1(12). 141-155, Available at SSRN: https://ssrn.com/abstract=3028820

Clinton Daniel (Contact Author)

University of South Florida ( email )

Tampa, FL 33620
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Abstract Views
380
PlumX Metrics