Extracting Clusters from Large Datasets with Multiple Similarity Measures Using IMSCAND

Abstract

We consider the problem of how to group information when multiple similarities are known. For a group of people, we may know their education, geographic location and family connections and want to cluster the people by treating all three of these similarities simultaneously. Our approach is to store each similarity as a slice in a tensor. The similarity measures are generated by comparing features. Generally, the object similarity matrix is dense. However it can be stored implicitly as the product of a sparse matrix, representing the object-feature matrix, and its transpose. For this new type of tensor where dense slices are stored implicitly, we have created a new decomposition called Implicit Slice Canonical Decomposition (IMSCAND). Our decomposition is equivalent to the tensor CANDECOMP/PARAFAC decomposition, which is a higher-order analogue of the matrix Singular Value decomposition (SVD) and Principal Component Analysis (PCA). From IMSCAND we obtain compilation feature vectors which are clustered using k-means. We demonstrate the applicability of IMSCAND on a set of journal articles with multiple similarities.

Type
Publication
In CSRI Summer Proceedings 2007
Date
Citation
T. M. Selee, T. G. Kolda, W. P. Kegelmeyer, J. D. Griffin. Extracting Clusters from Large Datasets with Multiple Similarity Measures Using IMSCAND. In CSRI Summer Proceedings 2007, M. L. Parks and S. S. Collis (eds.), Tech. Rep. SAND2007-7977, Sandia National Laboratories, pp. 87-103, 2007. http://www.cs.sandia.gov/CSRI/Proceedings/CSRI2007.pdf

BibTeX

@inproceedings{SeKoKeGr07,  
author = {Teresa M. Selee and Tamara G. Kolda and W. Philip Kegelmeyer and Joshua D. Griffin}, 
title = {Extracting Clusters from Large Datasets with Multiple Similarity Measures Using {IMSCAND}}, 
booktitle = {CSRI Summer Proceedings 2007}, 
editor = {Michael L. Parks and S. Scott Collis}, 
publisher = {Tech. Rep. SAND2007-7977, Sandia National Laboratories}, 
pages = {87--103}, 
month = {December}, 
year = {2007},	
url = {http://www.cs.sandia.gov/CSRI/Proceedings/CSRI2007.pdf},
}