Cluster Analysis of Imputed Financial Data Using an Augmentation-Based Algorithm
STATISTICAL DATA MINING AND KNOWLEDGE DISCOVERY, Bensmail, Halima and Romon P. DeGennaro, eds., CRC Press, pp. 513-528, 2003
Posted: 4 Aug 2005
Abstract
We introduce a novel statistical modeling technique to cluster analysis and apply it to financial data. Our main goals are to handle missing data and to find homogeneous groups within the data. Our approach is flexible and handles large and complex data structures with missing observations and with quantitative and qualitative measurements. We achieve this by mapping the data to a new structure that is free of distributional assumptions in choosing homogeneous groups of observations. Our new method also provides insight into the number of different categories needed for classifying the data. We use this approach to partition a matched sample of stocks. One group offers dividend reinvestment plans, and the other does not. Our approach partitions this sample with almost 97 percent accuracy even when using only easily available financial variables. One interpretation of this result is that the misclassified companies are the best candidates either to adopt a dividend reinvestment plan (if they have none) or to abandon one (if they currently offer one). We offer other suggestions for applications in the field of finance.
Keywords: cluster analysis, missing data, dividend reinvestment plan
JEL Classification: C00, G35
Suggested Citation: Suggested Citation