Trait

au.id.cxd.text.model

LsiDocumentSearch

Related Doc: package model

Permalink

trait LsiDocumentSearch extends AnyRef

implementation of document search for supplied input query array. Note that the input query should contain terms that have previously existed within the LSI model. The LSI model contains a term map, the set of terms not found in the vocabulary are also returned with the result. The cosine distance is returned unnormalised.

Linear Supertypes
AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. LsiDocumentSearch
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  11. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  12. def makeSearchSpace(lsi: LatentSemanticIndex): (DenseMatrix[Double], DenseMatrix[Double], DenseMatrix[Double])

    Permalink

    generate the search space for the U and V components by multiplying them against the square root of the singular value diagonal matrix.

    generate the search space for the U and V components by multiplying them against the square root of the singular value diagonal matrix.

    Note the search space may need to be cached

  13. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  15. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  16. def performSearch(searchSpace: (DenseMatrix[Double], DenseMatrix[Double], DenseMatrix[Double]), vectoriser: DocumentTermVectoriser, query: Array[String], stopWords: Seq[String], lsi: LatentSemanticIndex, stemQuery: Boolean = true): Buffer[(Int, Double, Seq[String])]

    Permalink

    search in the lsi model with an array query.

  17. def preprocessQuery(vectoriser: DocumentTermVectoriser)(query: Array[String], stopwords: Seq[String], lsi: LatentSemanticIndex, stemQuery: Boolean = true): DenseVector[Double]

    Permalink

    convert the query into a term vector.

    convert the query into a term vector.

    lsi

    : LatentSemanticIndex

    returns

    the query term vector. currently the term vector is a tfidf vector for the query based on the lsi model. However there are other methods of weighting terms so it will be changed to supply a counting trait to calculate the term weights

  18. def reduceToDimensons(lsi: LatentSemanticIndex, k: Int): LatentSemanticIndex

    Permalink

    The singular values can dictate how many dimensions we should retain.

    The singular values can dictate how many dimensions we should retain. After manual analysis it may be decided that we only want to retain a certain fixed number of dimensions

    Hence this will result in an SVD of reduced dimensionality where k is the number of principle components.

    The original SVD

    $$ \hat{X} = U S V' $$

    where $U$ has dimension $(m x n)$ and $S$ has size $n$ and $Vt$ has dimension $(n x n)$

    If we choose dimension $k < n$ then we have

    $U$ dimension $(m x k)$ $S$ dimension $k$ $V'$ dimension $(k x n)$

    Note that this does not reduce the original tfidf matrix, but will reduce the dimensions that the projects of the tfidf into the search space will have.

    Note that $k < n$

  19. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  20. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  21. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped