cluster similar attributes The value of the V matrix for each cluster is retained so that it is possible to sort groupings of terms in order of the value.
cluster similar attributes The value of the V matrix for each cluster is retained so that it is possible to sort groupings of terms in order of the value.
The output of the map has the key for the corresponding cluster. Whereas each entry in the group for the cluster is a tuple consisting of (Cluster x TermColumnIndex x Term x Weight)
cluster the documents by determining the maximum U values in the U matrix.
cluster the documents by determining the maximum U values in the U matrix. The U matrix is first processed by subtracting the most negative value so that all values are positive.
This will return a set of documents each grouped by the cluster to which they correspond. (ClusterId, Seq[(Int, Seq[String], Int)])
subtract the minimum value of the matrix
The LsiDocumentCluster approach uses the selected set of k components in order to generate k clusters.
Documents are associated with clusters by the use of the U matrix.
Terms are associated with clusters by the use of the Vt matrix
If there are k components, the U matrix has dimension (m x k)
where m is the number of documents.
And Vt matrix has the dimension (k x n) where n is the number of attributes.
Additionally the correlation matrices for attributes can be used to select highly correlated attributes once they have been clusered.
This produces a cluster labelling for documents and terms based on the components defined in S.
Created by cd on 13/1/17.