Improving Causal Inference with Text as Data in Empirical IS Research: A Machine Learning Approach

The 48th International Conference on Information Systems, 2020

9 Pages Posted: 19 Dec 2020

See all articles by Guopeng Yin

Guopeng Yin

RMF of the Harvard Medical Institutions

Jian Chen

Ohio State University (OSU), College of Food, Agricultural & Environmental Science, Agricultural, Environmental & Development Economics, Students

Date Written: October 21, 2020

Abstract

This study combines two streams of literature – text representation and machine learning-based causal inference, to study how to represent text as data to improve causal inference, i.e., estimating treatment effects more accurately. We choose a real problem context, Yelp reviews, to demonstrate how to train a topic modeling or Word2Vec model to transform review text into meaningful metrics and the causal forest to estimate the treatment effect of an ‘Elite’ badge recognized by Yelp on received votes of the review. Results show that the estimated average treatment effect (ATE) significantly decreases after adding quantitative text representations into the model. This implies that the positive effect of ‘Elite’ badge was overestimated without text information. We also present specific steps to help other researchers leverage the causal forest to estimate the heterogeneous effects across subgroups. Overall, we show that transforming text into quantitative data makes the treatment effect estimation more accurate.

Keywords: Causal Inference, Heterogeneous Treatment Effect, Text Representation, NLP, Machine Learning, Online Reviews

JEL Classification: C1, M16

Suggested Citation

Yin, Guopeng and Chen, Jian, Improving Causal Inference with Text as Data in Empirical IS Research: A Machine Learning Approach (October 21, 2020). The 48th International Conference on Information Systems, 2020, Available at SSRN: https://ssrn.com/abstract=3716465

Guopeng Yin

RMF of the Harvard Medical Institutions ( email )

Boston, MA 02215
United States

Jian Chen (Contact Author)

Ohio State University (OSU), College of Food, Agricultural & Environmental Science, Agricultural, Environmental & Development Economics, Students ( email )

2120 Fyffe Rd
Columbus, OH 43210
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
194
Abstract Views
646
Rank
282,310
PlumX Metrics