org.apache.mahout.classifier.naivebayes

NaiveBayes

trait NaiveBayes extends Serializable

Distributed training of a Naive Bayes model. Follows the approach presented in Rennie et.al.: Tackling the poor assumptions of Naive Bayes Text classifiers, ICML 2003, http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf

Linear Supertypes
Serializable, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. NaiveBayes
  2. Serializable
  3. AnyRef
  4. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Type Members

  1. type CategoryParser = (String) ⇒ String

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def argmax(v: Vector): (Int, Double)

    argmax with values as well returns a tuple of index of the max score and the score itself.

    argmax with values as well returns a tuple of index of the max score and the score itself.

    v

    Vector of of scores

    returns

    (bestIndex, bestScore)

  7. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def defaultAlphaI: Float

    default value for the Laplacian smoothing parameter

  10. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  12. def extractLabelsAndAggregateObservations[K](stringKeyedObservations: DrmLike[K], cParser: (String) ⇒ String = seq2SparseCategoryParser)(implicit arg0: ClassTag[K], ctx: DistributedContext): (HashMap[String, Integer], DrmLike[Int])

    Extract label Keys from raw TF or TF-IDF Matrix generated by seqdirectory/seq2sparse and aggregate TF or TF-IDF values by their label Override this method in engine specific modules to optimize

    Extract label Keys from raw TF or TF-IDF Matrix generated by seqdirectory/seq2sparse and aggregate TF or TF-IDF values by their label Override this method in engine specific modules to optimize

    stringKeyedObservations

    DrmLike matrix; Output from seq2sparse in form K = eg./Category/document_title V = TF or TF-IDF values per term

    cParser

    a String => String function used to extract categories from Keys of the stringKeyedObservations DRM. The default CategoryParser will extract "Category" from: '/Category/document_id'

    returns

    (labelIndexMap,aggregatedByLabelObservationDrm) labelIndexMap is a HashMap [String, Integer] K = label row index V = label aggregatedByLabelObservationDrm is a DrmLike[Int] of aggregated TF or TF-IDF counts per label

  13. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  15. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  16. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  17. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  18. final def notify(): Unit

    Definition Classes
    AnyRef
  19. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  20. def seq2SparseCategoryParser: (String) ⇒ String

    Default: seqdirectory/seq2Sparse Categories are Stored in Drm Keys as: /Category/document_id

  21. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  22. def test[K](model: NBModel, testSet: DrmLike[K], testComplementary: Boolean = false, cParser: (String) ⇒ String = seq2SparseCategoryParser)(implicit arg0: ClassTag[K], ctx: DistributedContext): ResultAnalyzer

    Test a trained model with a labeled dataset sequentially

    Test a trained model with a labeled dataset sequentially

    K

    implicitly determined Key type of test set DRM: String

    model

    a trained NBModel

    testSet

    a labeled testing set

    testComplementary

    test using a complementary or a standard NB classifier

    cParser

    a String => String function used to extract categories from Keys of the testing set DRM. The default CategoryParser will extract "Category" from: '/Category/document_id'

    *Note*: this method brings the entire test set into upfront memory, This method is optimized and parallelized in SparkNaiveBayes

    returns

    a result analyzer with confusion matrix and accuracy statistics

  23. def toString(): String

    Definition Classes
    AnyRef → Any
  24. def train(observationsPerLabel: DrmLike[Int], labelIndex: Map[String, Integer], trainComplementary: Boolean = true, alphaI: Float = defaultAlphaI): NBModel

    Distributed training of a Naive Bayes model.

    Distributed training of a Naive Bayes model. Follows the approach presented in Rennie et.al.: Tackling the poor assumptions of Naive Bayes Text classifiers, ICML 2003, http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf

    observationsPerLabel

    a DrmLike[Int] matrix containing term frequency counts for each label.

    trainComplementary

    whether or not to train a complementary Naive Bayes model

    alphaI

    Laplace smoothing parameter

    returns

    trained naive bayes model

  25. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped