Orthogonalization of Categorical Data: How to Fix a Measurement Problem in Statistical Distance Metrics

43 Pages Posted: 21 Nov 2013

Date Written: November 10, 2013

Abstract

Policy makers depend on economists, statisticians, and other social scientists to make accurate observations and draw solid conclusions from quantitative analysis. Econometrics, for example, has come a long way in the past century and guides many decisions made today. On the other hand, some statistical procedures have not had significant advances, but are instead applied and their original assumptions are forgotten. The appropriateness of many of these measurements has come into question, and while criticism is often accepted, little is done to correct them. In reality, there is a prolific measurement problem being committed every day. This problem involves the use of statistical distance metrics to measure social phenomena. For example, measurements which would routinely be used to answer questions like: by how much have the imports of the United States changed in the past year? By how much has racial diversity changed in the past decade? Does greater ethno-linguistic diversity lead to civil conflict? These and similar questions rely on accurate multi-variate distance metrics. However all distance metrics suffer from a common calculation problem. No one can deny that the math is correct, rather, the problem lies with an overlooked implicit assumption: that all categories are mutually orthogonal (right angles). This is a bold assumption in any context. In this paper I first show that this assumption is rarely valid, and second I suggest an orthogonalization procedure: measure the similarity or angle between categories, and then apply a transformation from spherical to rectangular coordinates. I illustrate the effect of the methodology using a simulation, a collection of potential applications, and two examples from international trade.

Keywords: Index Number Theory, International Trade, Orthogonalization, Principal Coordinates, Law of Cosines, Distance Metrics, Minkowski Metric, Euclidean Distance, Hirschman-Herfindahl Index, Business Analytics

JEL Classification: C43, C18, F10

Suggested Citation

Knippenberg, Ross W., Orthogonalization of Categorical Data: How to Fix a Measurement Problem in Statistical Distance Metrics (November 10, 2013). Available at SSRN: https://ssrn.com/abstract=2357607 or http://dx.doi.org/10.2139/ssrn.2357607

Ross W. Knippenberg (Contact Author)

Caterpillar, Inc. ( email )

United States

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
66
Abstract Views
573
Rank
612,800
PlumX Metrics