Practical Leverage-Based Sampling for Low-Rank Tensor Decomposition

Abstract

Conventional algorithms for finding low-rank canonical polyadic (CP) tensor decompositions are unwieldy for large sparse tensors. The CP decomposition can be computed by solving a sequence of overdetermined least problems with special Khatri-Rao structure. In this work, we present an application of randomized numerical linear algebra to fitting the CP decomposition of sparse tensors, solving a significantly smaller sampled least squares problem at each iteration with probabilistic guarantees on the approximation errors. Prior work has shown that sketching is effective in the dense case, but the prior approach cannot be applied to the sparse case because a fast Johnson-Lindenstrauss transform (e.g., using a fast Fourier transform) must be applied in each mode, causing the sparse tensor to become dense. Instead, we perform sketching through leverage score sampling, crucially relying on the fact that the structure of the Khatri-Rao product allows sampling from overestimates of the leverage scores without forming the full product or the corresponding probabilities. Naive application of leverage score sampling is ineffective because we often have cases where a few scores are quite large, leading to repeatedly sampling the few entries with large scores. We improve the speed by combining repeated rows. Additionally, we propose a novel hybrid of deterministic and random leverage-score sampling which consistently yields improved fits. Numerical results on real-world large-scale tensors show the method is significantly faster than competing methods without sacrificing accuracy.

Publication
arXiv
Date
Citation
B. W. Larsen, T. G. Kolda. Practical Leverage-Based Sampling for Low-Rank Tensor Decomposition. arXiv:2006.16438, 2020. http://arxiv.org/abs/2006.16438

Keywords

math.NA, cs.NA

BibTeX

@misc{arXiv-LaKo20,  
author = {Brett W. Larsen and Tamara G. Kolda}, 
title = {Practical Leverage-Based Sampling for Low-Rank Tensor Decomposition}, 
month = {June}, 
year = {2020},
eprint = {2006.16438},
eprintclass = {math.NA},
}