Diamond Sampling for Approximate Maximum All-pairs Dot-product (MAD) Search

Abstract

Given two sets of vectors, $A = \{a_1, \dots, a_m\}$ and $B = \{b_1,\dots,b_n\}$, our problem is to find the top-$t$ dot products, i.e., the largest $|a_i \cdot b_j|$ among all possible pairs. This is a fundamental mathematical problem that appears in numerous data applications involving similarity search, link prediction, and collaborative filtering. We propose a sampling-based approach that avoids direct computation of all $mn$ dot products. We select diamonds (i.e., four-cycles) from the weighted tripartite representation of $A$ and $B$. The probability of selecting a diamond corresponding to pair $(i,j)$ is proportional to $(a_i \cdot b_j)^2$, amplifying the focus on the largest-magnitude entries. Experimental results indicate that diamond sampling is orders of magnitude faster than direct computation and requires far fewer samples than any competing approach. We also apply diamond sampling to the special case of maximum inner product search, and get significantly better results than the state-of-the-art hashing methods.

Publication
In ICDM 2015: Proceedings of the 2015 IEEE International Conference on Data Mining
Date
Tags
Citation
G. Ballard, A. Pinar, T. G. Kolda, C. Seshadri. Diamond Sampling for Approximate Maximum All-pairs Dot-product (MAD) Search. In ICDM 2015: Proceedings of the 2015 IEEE International Conference on Data Mining, Atlantic City, NJ (2015-11-14 to 2015-11-17), pp. 11-20, 2015. https://doi.org/10.1109/ICDM.2015.46

Comments

Winner of ICDM15 Best Paper Prize!

BibTeX

@inproceedings{BaPiKoSe15,  
author = {Grey Ballard and Ali Pinar and Tamara G. Kolda and C. Seshadri}, 
title = {Diamond Sampling for Approximate Maximum All-pairs Dot-product ({MAD}) Search}, 
booktitle = {ICDM 2015: Proceedings of the 2015 IEEE International Conference on Data Mining},
venue = {Atlantic City, NJ},
eventdate = {2015-11-14/2015-11-17}, 
pages = {11-20}, 
month = {November}, 
year = {2015},
doi = {10.1109/ICDM.2015.46},
eprint = {1506.03872},
}