IDFModel#
- class pyspark.mllib.feature.IDFModel(java_model)[source]#
Represents an IDF model that can transform term frequency vectors.
New in version 1.2.0.
Methods
call
(name, *a)Call method of java_model
docFreq
()Returns the document frequency.
idf
()Returns the current IDF vector.
numDocs
()Returns number of documents evaluated to compute idf
transform
(x)Transforms term frequency (TF) vectors to TF-IDF vectors.
Methods Documentation
- call(name, *a)#
Call method of java_model
- transform(x)[source]#
Transforms term frequency (TF) vectors to TF-IDF vectors.
If minDocFreq was set for the IDF calculation, the terms which occur in fewer than minDocFreq documents will have an entry of 0.
New in version 1.2.0.
- Parameters
- x
pyspark.mllib.linalg.Vector
orpyspark.RDD
an RDD of term frequency vectors or a term frequency vector
- x
- Returns
pyspark.mllib.linalg.Vector
orpyspark.RDD
an RDD of TF-IDF vectors or a TF-IDF vector
Notes
In Python, transform cannot currently be used within an RDD transformation or action. Call transform directly on the RDD instead.