Here’s a link to a small R project experimenting with using a decomposition of TFIDF term vectors to experiment with search.

Project Link: text_svd

Text SVD

This is an experiment with using an SVD of a document term matrix to enable searching document examples by projecting a query term vector into the term space and using cosine similarity to rank results. Potentially extend the method for document clustering, or to gain further insights into common themes based on strongly related terms.

This project has an example of the first use case in the form of an R-shiny user interface.

The ideas in this project were later incorporated into the library I’m slowly tinkering on, and described on that project page as Notes on Singular Value Decomposition and Latent Semantic Indexing.

The method of clustering however was derived from some of the ideas provided in the book “Understanding complex data sets: Data Mining with Matrix Decompositions” by David Skillicorn which has quite a nice summary of SVD.