Efficient Methods for Sampling Responses from Large-Scale Qualitative Data

Marketing Science, Vol. 30, No. 3, May–June 2011, pp. 532–549

Posted: 21 Apr 2016

See all articles by Surendra Singh

Surendra Singh

University of Kansas - School of Business

Ze Wang

University of Central Florida

Date Written: 2011

Abstract

The World Wide Web contains a vast corpus of consumer-generated content that holds invaluable insights for improving the product and service offerings of firms. Yet the typical method for extracting diagnostic information from online content—text mining—has limitations. As a starting point, we propose analyzing a sample of comments before initiating text mining. Using a combination of real data and simulations, we demonstrate that a sampling procedure that selects respondents whose comments contain a large amount of information is superior to the two most popular sampling methods—simple random sampling and stratified random sampling—-in gaining insights from the data. In addition, we derive a method that determines the probability of observing diagnostic information repeated a specific number of times in the population, which will enable managers to base sample size decisions on the trade-off between obtaining additional diagnostic information and the added expense of a larger sample. We provide an illustration of one of the methods using a real data set from a website containing qualitative comments about staying at a hotel and demonstrate how sampling qualitative comments can be a useful first step in text mining.

Suggested Citation

Singh, Surendra and Wang, Ze, Efficient Methods for Sampling Responses from Large-Scale Qualitative Data (2011). Marketing Science, Vol. 30, No. 3, May–June 2011, pp. 532–549, Available at SSRN: https://ssrn.com/abstract=2767761

Surendra Singh (Contact Author)

University of Kansas - School of Business ( email )

1300 Sunnyside Avenue
Lawrence, KS 66045
United States

Ze Wang

University of Central Florida ( email )

4000 Central Florida Blvd
Orlando, FL 32816-1400
United States
4078236623 (Phone)

Do you have negative results from your research you’d like to share?

Paper statistics

Abstract Views
200
PlumX Metrics