Package org.apache.spark.ml.feature
Interface Word2VecBase
- All Superinterfaces:
HasInputCol
,HasMaxIter
,HasOutputCol
,HasSeed
,HasStepSize
,Identifiable
,Params
,Serializable
- All Known Implementing Classes:
Word2Vec
,Word2VecModel
public interface Word2VecBase
extends Params, HasInputCol, HasOutputCol, HasMaxIter, HasStepSize, HasSeed
Params for
Word2Vec
and Word2VecModel
.-
Method Summary
Modifier and TypeMethodDescriptionint
int
int
int
int
Sets the maximum length (in words) of each sentence in the input data.minCount()
The minimum number of times a token must appear to be included in the word2vec model's vocabulary.Number of partitions for sentences of words.validateAndTransformSchema
(StructType schema) Validate and transform the input schema.The dimension of the code that you want to transform from words.The window size (context words from [-window, window]).Methods inherited from interface org.apache.spark.ml.param.shared.HasInputCol
getInputCol, inputCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter
getMaxIter, maxIter
Methods inherited from interface org.apache.spark.ml.param.shared.HasOutputCol
getOutputCol, outputCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasStepSize
getStepSize, stepSize
Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString, uid
Methods inherited from interface org.apache.spark.ml.param.Params
clear, copy, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
-
Method Details
-
getMaxSentenceLength
int getMaxSentenceLength() -
getMinCount
int getMinCount() -
getNumPartitions
int getNumPartitions() -
getVectorSize
int getVectorSize() -
getWindowSize
int getWindowSize() -
maxSentenceLength
IntParam maxSentenceLength()Sets the maximum length (in words) of each sentence in the input data. Any sentence longer than this threshold will be divided into chunks of up tomaxSentenceLength
size. Default: 1000- Returns:
- (undocumented)
-
minCount
IntParam minCount()The minimum number of times a token must appear to be included in the word2vec model's vocabulary. Default: 5- Returns:
- (undocumented)
-
numPartitions
IntParam numPartitions()Number of partitions for sentences of words. Default: 1- Returns:
- (undocumented)
-
validateAndTransformSchema
Validate and transform the input schema.- Parameters:
schema
- (undocumented)- Returns:
- (undocumented)
-
vectorSize
IntParam vectorSize()The dimension of the code that you want to transform from words. Default: 100- Returns:
- (undocumented)
-
windowSize
IntParam windowSize()The window size (context words from [-window, window]). Default: 5- Returns:
- (undocumented)
-