Orthogonalization of Categorical Data: How to Fix a Measurement Problem in Statistical Distance Metrics
43 Pages Posted: 21 Nov 2013
Date Written: November 10, 2013
Abstract
Policy makers depend on economists, statisticians, and other social scientists to make accurate observations and draw solid conclusions from quantitative analysis. Econometrics, for example, has come a long way in the past century and guides many decisions made today. On the other hand, some statistical procedures have not had significant advances, but are instead applied and their original assumptions are forgotten. The appropriateness of many of these measurements has come into question, and while criticism is often accepted, little is done to correct them. In reality, there is a prolific measurement problem being committed every day. This problem involves the use of statistical distance metrics to measure social phenomena. For example, measurements which would routinely be used to answer questions like: by how much have the imports of the United States changed in the past year? By how much has racial diversity changed in the past decade? Does greater ethno-linguistic diversity lead to civil conflict? These and similar questions rely on accurate multi-variate distance metrics. However all distance metrics suffer from a common calculation problem. No one can deny that the math is correct, rather, the problem lies with an overlooked implicit assumption: that all categories are mutually orthogonal (right angles). This is a bold assumption in any context. In this paper I first show that this assumption is rarely valid, and second I suggest an orthogonalization procedure: measure the similarity or angle between categories, and then apply a transformation from spherical to rectangular coordinates. I illustrate the effect of the methodology using a simulation, a collection of potential applications, and two examples from international trade.
Keywords: Index Number Theory, International Trade, Orthogonalization, Principal Coordinates, Law of Cosines, Distance Metrics, Minkowski Metric, Euclidean Distance, Hirschman-Herfindahl Index, Business Analytics
JEL Classification: C43, C18, F10
Suggested Citation: Suggested Citation