Prediction for Big Data Through Kriging: Small Sequential and One-Shot Designs
CentER Discussion Paper Series No. 2018-022
43 Pages Posted: 30 Jul 2018
Date Written: July 9, 2018
Abstract
Kriging or Gaussian process (GP) modeling is an interpolation method that assumes the outputs (responses) are more correlated, the closer the inputs (explanatory or independent variables) are. A GP has unknown (hyper)parameters that must be estimated; the standard estimation method uses the "maximum likelihood" criterion. However, big data make it hard to compute the estimates of these GP parameters, and the resulting Kriging predictor and the variance of this predictor. To solve this problem, some authors select a relatively small subset from the big set of previously observed "old" data; their method is sequential and depends on the variance of the Kriging predictor. The resulting designs turn out to be "local"; i.e., most design points are concentrated around the point to be predicted. We develop three alternative one-shot methods that do not depend on GP parameters: (i) select a small subset such that this subset still covers the original input space albeit coarser; (ii) select a subset with relatively many but not all combinations close to the new combination that is to be predicted, and (iii) select a subset with the nearest neighbors (NNs) of this new combination. To evaluate these designs, we compare their squared prediction errors in several numerical (Monte Carlo) experiments. These experiments show that our NN design is a viable alternative for the more sophisticated sequential designs.
Keywords: Kriging; Gaussian Process; Big Data; Experimental Design; Nearest Neighbor
JEL Classification: C0; C1; C9; C15; C44
Suggested Citation: Suggested Citation