Trait

au.id.cxd.text.model

LatentSemanticIndexBuilder

Related Doc: package model

Permalink

trait LatentSemanticIndexBuilder extends AnyRef

the pipeline that is used to build the index.

- build the tfidf matrix - perform the SVD - capture the entropy in the data set and the component contribution for each singular component. -

Linear Supertypes
AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. LatentSemanticIndexBuilder
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def buildFromCsv(inputCsv: String, docIdCols: Seq[Int], skipHeader: Boolean = true, stemTerms: Boolean = true): (Double, DenseVector[Double], LatentSemanticIndex)

    Permalink

    recompute the entire latent semantic index from the data stored in the supplied CSV file.

    recompute the entire latent semantic index from the data stored in the supplied CSV file. The CSV file must include one or more document ids for each row and must include the document text content in the remaining column.

    Eg: DocID1,SubID2,Text 1, 111 , "This is some text data"

    The fields should be quoted either with " or '.

    returns

    The return result includes the latent semantic index, along with the entropy of the terms calculated from the data set and the contributions of each singular component to the total entropy of the document set.

  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. def computeSvd(tfIdf: DenseMatrix[Double]): (DenseSVD, Double, DenseVector[Double])

    Permalink

    compute the Svd along with the entropy of the data set and the contributions of each singular value component.

  8. def computeTfIdf(terms: Seq[Array[String]]): (Map[Int, (String, Int, Int)], DenseMatrix[Double])

    Permalink

    compute the TfIdf matrix

  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  11. def extractStemmedTerms(stopwords: Seq[String], lines: Seq[String]): Seq[Array[String]]

    Permalink

    extract stemmed terms from the supplied sequence of lines

  12. def extractTerms(stopwords: Seq[String], lines: Seq[String]): Seq[Array[String]]

    Permalink

    extract terms without stemming.

  13. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  15. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  16. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  17. def loadStopWords(): Seq[String]

    Permalink

    load the default set of stopwords using the embedded stopwords loader.

  18. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  19. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  20. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  21. def readIndexedCsv(path: String, docIdCols: Seq[Int], skipHeader: Boolean = true): (Map[Int, Seq[String]], ListBuffer[String])

    Permalink

    read an indexed CSV file containing document Ids for each row and string text for each document.

  22. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  23. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  24. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  25. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped