A Missing Data Paradox for Nearest Neighbor Recommender Systems

Fleder, Daniel M.; Hosanagar, Kartik

doi:10.2139/ssrn.1322548

Download This Paper

Open PDF in Browser

Add Paper to My Library

A Missing Data Paradox for Nearest Neighbor Recommender Systems

7 Pages Posted: 4 Jan 2009 Last revised: 19 Aug 2018

See all articles by Daniel M. Fleder

Daniel M. Fleder

University of Pennsylvania - The Wharton School

Kartik Hosanagar

University of Pennsylvania - Operations & Information Management Department

Date Written: October 1, 2007

Abstract

Recommender systems typically work over sparse matrices. Although most methods assume so, these matrices' entries are often not missing at random (NMAR). How problematic is this? We present a puzzle. Some methods explicitly account for NMAR processes. This has been shown to improve predictions. Many methods, however, assume that entries are missing at random (MAR). While they may be wrong in that assumption, we show they may benefit nonetheless from its being violated. Given that some data must go missing, NMAR can often pick the "right" values to preserve (i.e. it preserves the more informative data). Thus despite the perception that NMAR is bad, it can often improve recommendations. This may explain some of the historical success of collaborative filtering even when this assumption has been violated.

Keywords: recommender systems, collaborative filtering, predictive modeling, missing data

Suggested Citation: Suggested Citation

Fleder, Daniel M. and Hosanagar, Kartik, A Missing Data Paradox for Nearest Neighbor Recommender Systems (October 1, 2007). Available at SSRN: https://ssrn.com/abstract=1322548 or http://dx.doi.org/10.2139/ssrn.1322548