All Classes and Interfaces (Spark 4.1.0 JavaDoc)

Class

Description

Class for absolute error loss calculation (for regression).

AbstractLauncher<T extends AbstractLauncher<T>>

Base class for launcher implementations.

Indicates that the source accepts the latest seen offset, which requires streaming execution to provide the latest seen offset when restarting the streaming query from checkpoint.

AccumulableInfo

:: DeveloperApi :: Information about an AccumulatorV2 modified during a task or stage.

AccumulableInfo

AccumulableInfoSerializer

AccumulatorContext

An internal class used to track accumulators by Spark itself.

AccumulatorV2<IN,OUT>

The base class for accumulators, that can accumulate inputs of type IN, and produce output of type OUT.

ActivationFunction

Trait for functions and their derivatives for functional layers

AFTSurvivalRegression

Fit a parametric survival regression model named accelerated failure time (AFT) model (see Accelerated failure time model (Wikipedia)) based on the Weibull distribution of the survival time.

AFTSurvivalRegressionModel

Model produced by AFTSurvivalRegression.

AFTSurvivalRegressionModel.Data$

AFTSurvivalRegressionParams

Params for accelerated failure time (AFT) regression.

AggregatedDialect

AggregatedDialect can unify multiple dialects into one virtual Dialect.

AggregateFunc

Base class of the Aggregate Functions.

AggregateFunction<S extends Serializable,R>

Interface for a function that produces a result value by aggregating over multiple input rows.

AggregatingEdgeContext<VD,ED,A>

Aggregation

Aggregation in SQL statement.

Aggregator<K,V,C>

:: DeveloperApi :: A set of functions used to aggregate data.

Aggregator<IN,BUF,OUT>

A base class for user-defined aggregations, which can be used in Dataset operations to take all of the elements of a group and reduce them to a single value.

Algo

Enum to select the algorithm for the decision tree

AllFlows

Used in full graph update to select all flows.

AllJobsCancelled

AllReceiverIds

A message used by ReceiverTracker to ask all receiver's ids still stored in ReceiverTrackerEndpoint.

AllTables

Used in full graph updates to select all tables.

ALS

Alternating Least Squares (ALS) matrix factorization.

ALS

Alternating Least Squares matrix factorization.

ALS.InBlock$

ALS.LeastSquaresNESolver

Trait for least squares solvers applied to the normal equation.

ALS.Rating<ID>

Rating class for better code readability.

Model fitted by ALS.

Common params for ALS and ALSModel.

ALSParams

Common params for ALS.

AlwaysFalse

A predicate that always evaluates to false.

AlwaysFalse

A filter that always evaluates to false.

AlwaysTrue

A predicate that always evaluates to true.

AlwaysTrue

A filter that always evaluates to true.

AnalysisException

Thrown when a query fails to analyze, usually because the query itself is invalid.

AnalysisWarning

Represents a warning generated as part of graph analysis.

AnalysisWarning.StreamingReaderOptionsDropped

Warning that some streaming reader options are being dropped

AnalysisWarning.StreamingReaderOptionsDropped$

And

A predicate that evaluates to true iff both left and right evaluate to true.

And

A filter that evaluates to true iff both left or right evaluate to true.

ANOVATest

ANOVA Test for continuous data.

AnyDataType

An AbstractDataType that matches any concrete data types.

AnyTimestampType

AnyTimestampTypeExpression

ApiHelper

ApiRequestContext

AppendOnceFlow

A Flow that reads source[s] completely and appends data to the target, just once.

AppHistoryServerPlugin

An interface for creating history listeners(to replay event logs) defined in other modules like SQL, and setup the UI of the plugin to rebuild the history UI.

ApplicationAttemptInfo

ApplicationEnvironmentInfo

ApplicationInfo

ApplicationStatus

ApplyInPlace

Implements in-place application of functions in the arrays

ApproximateEvaluator<U,R>

An object that computes a function incrementally by merging in results of type U from multiple tasks.

AppStatusUtils

AreaUnderCurve

Computes the area under the curve (AUC) using the trapezoidal rule.

ARPACK

ARPACK routines for MLlib's vectors and matrices.

ArrayImplicits

Implicit methods related to Scala Array.

ArrayImplicits.SparkArrayOps<T>

ArrayType

ArrowColumnVector

A column vector backed by Apache Arrow.

ArrowUtils

ArtifactUtils

AskPermissionToCommitOutput

AssociationRules

Generates association rules from a RDD[FreqItemset[Item}.

AssociationRules.Rule<Item>

An association rule between sets of items.

AsyncEventQueue

An asynchronous queue for events.

AsyncRDDActions<T>

A set of asynchronous RDD actions available through an implicit conversion.

Attribute

Abstract class for ML attributes.

AttributeFactory

Trait for ML attribute factories.

AttributeGroup

Attributes that describe a vector ML column.

AttributeKeys

Keys used to store attributes.

AttributeType

An enum-like type for attribute types: AttributeType$.Numeric, AttributeType$.Nominal, and AttributeType$.Binary.

Avg

An aggregate function that returns the mean of all the values in a group.

BackoffStrategy

A BackoffStrategy determines the backoff duration (how long we should wait) for retries after failures.

BarrierCoordinatorMessage

BarrierTaskContext

:: Experimental :: A TaskContext with extra contextual info and tooling for tasks in a barrier stage.

BarrierTaskInfo

:: Experimental :: Carries all task infos of a barrier task.

BaseAppResource

Base class for resource handlers that use app-specific data.

BaseReadWrite

Trait for MLWriter and MLReader.

BaseRelation

Represents a collection of tuples with a known schema.

BaseRRDD<T,U>

BaseStreamingAppResource

Base class for streaming API handlers, provides easy access to the streaming listener that holds the app's information.

BasicBlockReplicationPolicy

Batch

A physical representation of a data source scan for batch queries.

BatchInfo

:: DeveloperApi :: Class having information on completed batches.

BatchReadOptions

Options for a batch read of an input.

BatchStatus

BatchTableWrite

A `FlowExecution` that writes a batch `DataFrame` to a `Table`.

BatchWrite

An interface that defines how to write the data to data source for batch processing.

BernoulliCellSampler<T>

:: DeveloperApi :: A sampler based on Bernoulli trials for partitioning a data sequence.

BernoulliSampler<T>

:: DeveloperApi :: A sampler based on Bernoulli trials.

Binarizer

Binarize a column of continuous features given a threshold.

BinaryAttribute

A binary attribute.

BinaryClassificationEvaluator

Evaluator for binary classification, which expects input columns rawPrediction, label and an optional weight column.

BinaryClassificationMetricComputer

Trait for a binary classification evaluation metric computer.

BinaryClassificationMetrics

Evaluator for binary classification.

BinaryClassificationSummary

Abstraction for binary classification results for a given model.

BinaryConfusionMatrix

Trait for a binary confusion matrix.

BinaryLogisticRegressionSummary

Abstraction for binary logistic regression results for a given model.

BinaryLogisticRegressionSummaryImpl

Binary logistic regression results for a given model.

BinaryLogisticRegressionTrainingSummary

Abstraction for binary logistic regression training results.

BinaryLogisticRegressionTrainingSummaryImpl

Binary logistic regression training results.

BinaryRandomForestClassificationSummary

Abstraction for BinaryRandomForestClassification results for a given model.

BinaryRandomForestClassificationSummaryImpl

Binary RandomForestClassification for a given model.

BinaryRandomForestClassificationTrainingSummary

Abstraction for BinaryRandomForestClassification training results.

BinaryRandomForestClassificationTrainingSummaryImpl

Binary RandomForestClassification training results.

BinarySample

Class that represents the group and value of a sample.

BinaryType

The data type representing Array[Byte] values.

BinomialBounds

Utility functions that help us determine bounds on adjusted sampling rate to guarantee exact sample size with high confidence when sampling without replacement.

BisectingKMeans

A bisecting k-means algorithm based on the paper "A comparison of document clustering techniques" by Steinbach, Karypis, and Kumar, with modification to fit Spark.

BisectingKMeans

A bisecting k-means algorithm based on the paper "A comparison of document clustering techniques" by Steinbach, Karypis, and Kumar, with modification to fit Spark.

BisectingKMeansModel

Model fitted by BisectingKMeans.

BisectingKMeansModel

Clustering model produced by BisectingKMeans.

BisectingKMeansModel.SaveLoadV1_0$

BisectingKMeansModel.SaveLoadV2_0$

BisectingKMeansModel.SaveLoadV3_0$

BisectingKMeansParams

Common params for BisectingKMeans and BisectingKMeansModel

BisectingKMeansSummary

Summary of BisectingKMeans.

BLAS

BLAS routines for MLlib's vectors and matrices.

BLAS

BLAS routines for MLlib's vectors and matrices.

BlockData

Abstracts away how blocks are stored and provides different ways to read the underlying block data.

BlockEvictionHandler

BlockGeneratorListener

Listener object for BlockGenerator events

BlockId

:: DeveloperApi :: Identifies a particular Block of data, usually associated with a single file.

BlockInfoWrapper

BlockManagerId

:: DeveloperApi :: This class represent a unique identifier for a BlockManager.

BlockManagerMessages

BlockManagerMessages.BlockLocationsAndStatus

The response message of GetLocationsAndStatus request.

BlockManagerMessages.BlockLocationsAndStatus$

BlockManagerMessages.BlockManagerHeartbeat

BlockManagerMessages.BlockManagerHeartbeat$

BlockManagerMessages.DecommissionBlockManager$

BlockManagerMessages.DecommissionBlockManagers

BlockManagerMessages.DecommissionBlockManagers$

BlockManagerMessages.GetBlockStatus

BlockManagerMessages.GetBlockStatus$

BlockManagerMessages.GetExecutorEndpointRef

BlockManagerMessages.GetExecutorEndpointRef$

BlockManagerMessages.GetLocations

BlockManagerMessages.GetLocations$

BlockManagerMessages.GetLocationsAndStatus

BlockManagerMessages.GetLocationsAndStatus$

BlockManagerMessages.GetLocationsMultipleBlockIds

BlockManagerMessages.GetLocationsMultipleBlockIds$

BlockManagerMessages.GetMatchingBlockIds

BlockManagerMessages.GetMatchingBlockIds$

BlockManagerMessages.GetMemoryStatus$

BlockManagerMessages.GetPeers

BlockManagerMessages.GetPeers$

BlockManagerMessages.GetRDDBlockVisibility

BlockManagerMessages.GetRDDBlockVisibility$

BlockManagerMessages.GetReplicateInfoForRDDBlocks

BlockManagerMessages.GetReplicateInfoForRDDBlocks$

BlockManagerMessages.GetShufflePushMergerLocations

BlockManagerMessages.GetShufflePushMergerLocations$

BlockManagerMessages.GetStorageStatus$

BlockManagerMessages.IsExecutorAlive

BlockManagerMessages.IsExecutorAlive$

BlockManagerMessages.MarkRDDBlockAsVisible

BlockManagerMessages.MarkRDDBlockAsVisible$

BlockManagerMessages.RegisterBlockManager

BlockManagerMessages.RegisterBlockManager$

BlockManagerMessages.RemoveBlock

BlockManagerMessages.RemoveBlock$

BlockManagerMessages.RemoveBroadcast

BlockManagerMessages.RemoveBroadcast$

BlockManagerMessages.RemoveExecutor

BlockManagerMessages.RemoveExecutor$

BlockManagerMessages.RemoveRdd

BlockManagerMessages.RemoveRdd$

BlockManagerMessages.RemoveShuffle

BlockManagerMessages.RemoveShuffle$

BlockManagerMessages.RemoveShufflePushMergerLocation

BlockManagerMessages.RemoveShufflePushMergerLocation$

BlockManagerMessages.ReplicateBlock

BlockManagerMessages.ReplicateBlock$

BlockManagerMessages.StopBlockManagerMaster$

BlockManagerMessages.ToBlockManagerMaster

BlockManagerMessages.ToBlockManagerMasterStorageEndpoint

BlockManagerMessages.TriggerHeapHistogram$

Driver to Executor message to get a heap histogram.

BlockManagerMessages.TriggerThreadDump$

Driver to Executor message to trigger a thread dump.

BlockManagerMessages.UpdateBlockInfo

BlockManagerMessages.UpdateBlockInfo$

BlockManagerMessages.UpdateRDDBlockTaskInfo

BlockManagerMessages.UpdateRDDBlockTaskInfo$

BlockManagerMessages.UpdateRDDBlockVisibility

BlockManagerMessages.UpdateRDDBlockVisibility$

BlockMatrix

Represents a distributed matrix in blocks of local matrices.

BlockNotFoundException

BlockReplicationPolicy

::DeveloperApi:: BlockReplicationPrioritization provides logic for prioritizing a sequence of peers for replicating blocks.

BlockReplicationUtils

BlockStatus

BlockUpdatedInfo

:: DeveloperApi :: Stores information about a block status in a block manager.

BloomFilter

A Bloom filter is a space-efficient probabilistic data structure that offers an approximate containment test with one-sided error: if it claims that an item is contained in it, this might be in error, but if it claims that an item is not contained in it, then this is definitely true.

BloomFilter.Version

BooleanParam

Specialized version of Param[Boolean] for Java.

BooleanType

The data type representing Boolean values.

BooleanTypeExpression

BoostingStrategy

Configuration options for GradientBoostedTrees.

BoundedDouble

A Double value with error bars and associated confidence.

BoundFunction

Represents a function that is bound to an input type.

BoundProcedure

A procedure that is bound to input types.

BreakingChangeInfo

Additional information if the error was caused by a breaking change.

BreezeUtil

In-place DGEMM and DGEMV for Breeze

Broadcast<T>

A broadcast variable.

BroadcastBlockId

BroadcastFactory

An interface for all the broadcast implementations in Spark (to allow multiple broadcast implementations).

BucketedRandomProjectionLSH

This BucketedRandomProjectionLSH implements Locality Sensitive Hashing functions for Euclidean distance metrics.

BucketedRandomProjectionLSHModel

Model produced by BucketedRandomProjectionLSH, where multiple random vectors are stored.

BucketedRandomProjectionLSHModel.Data$

BucketedRandomProjectionLSHParams

Params for BucketedRandomProjectionLSH.

Bucketizer

Bucketizer maps a column of continuous features to a column of feature buckets.

BufferReleasingInputStream

Helper class that ensures a ManagedBuffer is released upon InputStream.close() and also detects stream corruption if streamCompressedOrEncrypted is true

ByteExactNumeric

ByteType

The data type representing Byte values.

ByteTypeExpression

CachedBatch

Basic interface that all cached batches of data must support.

CachedBatchSerializer

Provides APIs that handle transformations of SQL data associated with the cache/persist APIs.

CacheId

CalendarInterval

The class representing calendar intervals.

CalendarIntervalType

The data type representing calendar intervals.

CaseInsensitiveStringMap

Case-insensitive map of string keys to string values.

Cast

Represents a cast expression in the public logical expression API.

Catalog

Catalog interface for Spark.

CatalogExtension

An API to extend the Spark built-in session catalog.

CatalogMetadata

A catalog in Spark, as returned by the listCatalogs method defined in Catalog.

CatalogNotFoundException

CatalogPlugin

A marker interface to provide a catalog implementation for Spark.

Catalogs

CatalogV2Implicits

Conversion helpers for working with v2 CatalogPlugin.

CatalogV2Implicits.BucketSpecHelper

CatalogV2Implicits.CatalogHelper

CatalogV2Implicits.ClusterByHelper

CatalogV2Implicits.ColumnsHelper

CatalogV2Implicits.FunctionIdentifierHelper

CatalogV2Implicits.IdentifierHelper

CatalogV2Implicits.MultipartIdentifierHelper

CatalogV2Implicits.NamespaceHelper

CatalogV2Implicits.PartitionTypeHelper

CatalogV2Implicits.TableIdentifierHelper

CatalogV2Implicits.TransformHelper

CatalogV2Util

CatalystScan

::Experimental:: An interface for experimenting with a more direct connection to the query planner.

CategoricalSplit

Split which tests a categorical feature.

CausedBy

Extractor Object for pulling out the root cause of an error.

A CHECK constraint.

Enumeration to manage state transitions of an RDD through checkpointing

ChildFirstURLClassLoader

A mutable class loader that gives preference to its own URLs over the parent class loader when loading classes and resources.

ChiSqSelector

Deprecated.

use UnivariateFeatureSelector instead.

ChiSqSelector

Creates a ChiSquared feature selector.

ChiSqSelectorModel

Model fitted by ChiSqSelector.

ChiSqSelectorModel

Chi Squared selector model.

ChiSqSelectorModel.ChiSqSelectorModelWriter

ChiSqSelectorModel.Data$

ChiSqSelectorModel.SaveLoadV1_0$

ChiSqTest

Conduct the chi-squared test for the input RDDs using the specified method.

ChiSqTest.Method

param: name String name for the method.

ChiSqTest.Method$

ChiSqTest.NullHypothesis$

ChiSqTestResult

Object containing the test results for the chi-squared hypothesis test.

ChiSquareTest

Chi-square hypothesis testing for categorical data.

CholeskyDecomposition

Compute Cholesky decomposition.

CircularDependencyException

Raised when there's a circular dependency in the current pipeline.

ClassificationLoss

ClassificationModel<FeaturesType,M extends ClassificationModel<FeaturesType,M>>

Model produced by a Classifier.

ClassificationModel

Represents a classification model that predicts to which of a set of categories an example belongs.

ClassificationSummary

Abstraction for multiclass classification results for a given model.

Classifier<FeaturesType,E extends Classifier<FeaturesType,E,M>,M extends ClassificationModel<FeaturesType,M>>

Single-label binary or multiclass classification.

ClassifierParams

(private[spark]) Params for classification.

Listener class used when any item has been cleaned by the Cleaner class.

Classes that represent cleaning tasks.

CleanupTaskWeakReference

A WeakReference associated with a CleanupTask.

Clock

An interface to represent clocks, so that they can be mocked out in unit tests.

ClosureCleaner

A cleaner that renders closures serializable if they can be done so safely.

ClusterByTransform

This class represents a transform for ClusterBySpec.

ClusteredDistribution

A distribution where tuples that share the same values for clustering expressions are co-located in the same partition.

ClusteringEvaluator

Evaluator for clustering results.

ClusteringMetrics

Metrics for clustering, which expects two input columns: prediction and label.

ClusteringSummary

Summary of clustering algorithms.

CoarseGrainedClusterMessage

CoarseGrainedClusterMessages

CoarseGrainedClusterMessages.AddWebUIFilter

CoarseGrainedClusterMessages.AddWebUIFilter$

CoarseGrainedClusterMessages.DecommissionExecutor$

CoarseGrainedClusterMessages.DecommissionExecutorsOnHost

CoarseGrainedClusterMessages.DecommissionExecutorsOnHost$

CoarseGrainedClusterMessages.ExecutorDecommissioning

CoarseGrainedClusterMessages.ExecutorDecommissioning$

CoarseGrainedClusterMessages.ExecutorDecommissionSigReceived$

CoarseGrainedClusterMessages.GetExecutorLossReason

CoarseGrainedClusterMessages.GetExecutorLossReason$

CoarseGrainedClusterMessages.IsExecutorAlive

CoarseGrainedClusterMessages.IsExecutorAlive$

CoarseGrainedClusterMessages.KillExecutors

CoarseGrainedClusterMessages.KillExecutors$

CoarseGrainedClusterMessages.KillExecutorsOnHost

CoarseGrainedClusterMessages.KillExecutorsOnHost$

CoarseGrainedClusterMessages.KillTask

CoarseGrainedClusterMessages.KillTask$

CoarseGrainedClusterMessages.LaunchedExecutor

CoarseGrainedClusterMessages.LaunchedExecutor$

CoarseGrainedClusterMessages.LaunchTask

CoarseGrainedClusterMessages.LaunchTask$

CoarseGrainedClusterMessages.MiscellaneousProcessAdded

CoarseGrainedClusterMessages.MiscellaneousProcessAdded$

CoarseGrainedClusterMessages.RegisterClusterManager

CoarseGrainedClusterMessages.RegisterClusterManager$

CoarseGrainedClusterMessages.RegisterExecutor

CoarseGrainedClusterMessages.RegisterExecutor$

CoarseGrainedClusterMessages.RemoveExecutor

CoarseGrainedClusterMessages.RemoveExecutor$

CoarseGrainedClusterMessages.RemoveWorker

CoarseGrainedClusterMessages.RemoveWorker$

CoarseGrainedClusterMessages.RequestExecutors

CoarseGrainedClusterMessages.RequestExecutors$

CoarseGrainedClusterMessages.RetrieveDelegationTokens$

CoarseGrainedClusterMessages.RetrieveLastAllocatedExecutorId$

CoarseGrainedClusterMessages.RetrieveSparkAppConfig

CoarseGrainedClusterMessages.RetrieveSparkAppConfig$

CoarseGrainedClusterMessages.ReviveOffers$

CoarseGrainedClusterMessages.SetupDriver

CoarseGrainedClusterMessages.SetupDriver$

CoarseGrainedClusterMessages.ShufflePushCompletion

CoarseGrainedClusterMessages.ShufflePushCompletion$

CoarseGrainedClusterMessages.Shutdown

CoarseGrainedClusterMessages.Shutdown$

CoarseGrainedClusterMessages.SparkAppConfig

CoarseGrainedClusterMessages.SparkAppConfig$

CoarseGrainedClusterMessages.StatusUpdate

CoarseGrainedClusterMessages.StatusUpdate$

CoarseGrainedClusterMessages.StopDriver$

CoarseGrainedClusterMessages.StopExecutor$

CoarseGrainedClusterMessages.StopExecutors$

CoarseGrainedClusterMessages.TaskThreadDump

CoarseGrainedClusterMessages.TaskThreadDump$

CoarseGrainedClusterMessages.UpdateDelegationTokens

CoarseGrainedClusterMessages.UpdateDelegationTokens$

CoarseGrainedClusterMessages.UpdateExecutorLogLevel

CoarseGrainedClusterMessages.UpdateExecutorLogLevel$

CoarseGrainedClusterMessages.UpdateExecutorsLogLevel

CoarseGrainedClusterMessages.UpdateExecutorsLogLevel$

CodegenMetrics

Metrics for code generation.

CoGroupedRDD<K>

:: DeveloperApi :: An RDD that cogroups its parents.

CoGroupFunction<K,V1,V2,R>

A function that returns zero or more output records from each grouping key and its values from 2 Datasets.

CollatedEqualNullSafe

Collation aware equivalent of EqualNullSafe.

CollatedEqualTo

Collation aware equivalent of EqualTo.

CollatedFilter

Base class for collation aware string filters.

CollatedGreaterThan

Collation aware equivalent of GreaterThan.

CollatedGreaterThanOrEqual

Collation aware equivalent of GreaterThanOrEqual.

CollatedIn

Collation aware equivalent of In.

CollatedLessThan

Collation aware equivalent of LessThan.

CollatedLessThanOrEqual

Collation aware equivalent of LessThanOrEqual.

CollatedStringContains

Collation aware equivalent of StringContains.

CollatedStringEndsWith

Collation aware equivalent of StringEndsWith.

CollatedStringStartsWith

Collation aware equivalent of StringStartsWith.

CollectionAccumulator<T>

An accumulator for collecting a list of elements.

CollectionsUtils

Column

A column in Spark, as returned by listColumns method in Catalog.

Column

A column that will be computed based on the data in a DataFrame.

Column

An interface representing a column of a Table.

ColumnarArray

Array abstraction in ColumnVector.

ColumnarBatch

This class wraps multiple ColumnVectors as a row-wise table.

ColumnarBatchRow

This class wraps an array of ColumnVector and provides a row view.

ColumnarMap

Map abstraction in ColumnVector.

ColumnarRow

Row abstraction in ColumnVector.

ColumnDefaultValue

A class representing the default value of a column.

ColumnName

A convenient class used for constructing schema.

ColumnPruner

Utility transformer for removing temporary columns from a DataFrame.

ColumnPruner.Data$

ColumnStatistics

An interface to represent column statistics, which is part of Statistics.

ColumnVector

An interface representing in-memory columnar data in Spark.

CommandLineLoggingUtils

CommandLineUtils

Contains basic command line parsing functionality and methods to parse some common Spark CLI options.

CompleteFlow

A Flow that declares exactly what data should be in the target table.

ComplexFutureAction<T>

A FutureAction for actions that could trigger multiple Spark jobs.

CompositeReadLimit

/** Represents a ReadLimit where the MicroBatchStream should scan approximately given maximum number of rows with at least the given minimum number of rows.

CompressionCodec

:: DeveloperApi :: CompressionCodec allows the customization of choosing different compression implementations to be used in block storage.

Configurable

A trait to implement Configurable interface.

ConnectedComponents

Connected components algorithm.

ConstantInputDStream<T>

An input stream that always returns the same RDD on each time step.

Constraint

A constraint that restricts states of data in a table.

Constraint.ValidationStatus

An indicator of the validity of the constraint.

ConstructPipelineEvent

A factory object that is used to construct PipelineEvents with common fields automatically filled in.

ContextAwareIterator<T>

Deprecated.

since 4.0.0 as its only usage for Python evaluation is now extinct

ContextBarrierId

For each barrier stage attempt, only at most one barrier() call can be active at any time, thus we can use (stageId, stageAttemptId) to identify the stage attempt where the barrier() call is from.

ContinuousPartitionReader<T>

A variation on PartitionReader for use with continuous streaming processing.

ContinuousPartitionReaderFactory

A variation on PartitionReaderFactory that returns ContinuousPartitionReader instead of PartitionReader.

ContinuousSplit

Split which tests a continuous feature.

ContinuousStream

A SparkDataStream for streaming queries with continuous mode.

CoordinateMatrix

Represents a matrix in coordinate format.

CoreDataflowNodeProcessor

Processor that is responsible for analyzing each flow and sort the nodes in topological order

Correlation

API for correlation functions in MLlib, compatible with DataFrames and Datasets.

Correlation

Trait for correlation algorithms.

CorrelationNames

Maintains supported and default correlation names.

Correlations

Delegates computation to the specific correlation object based on the input method name.

CosineSilhouette

The algorithm which is implemented in this object, instead, is an efficient and parallel implementation of the Silhouette using the cosine distance measure.

Count

An aggregate function that returns the number of the specific row in a group.

CountingWritableChannel

CountMinSketch

A Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.

CountMinSketch.Version

CountStar

An aggregate function that returns the number of rows in a group.

CountVectorizer

Extracts a vocabulary from document collections and generates a CountVectorizerModel.

CountVectorizerModel

Converts a text document to a sparse vector of token counts.

CountVectorizerModel.Data$

CountVectorizerParams

Params for CountVectorizer and CountVectorizerModel.

CreatableRelationProvider

CreateTableWriter<T>

Trait to restrict calls to create and replace operations.

CrossValidator

K-fold cross validation performs model selection by splitting the dataset into a set of non-overlapping randomly partitioned folds which are used as separate training and test datasets e.g., with k=3 folds, K-fold cross validation will generate 3 (training, test) dataset pairs, each of which uses 2/3 of the data for training and 1/3 for testing.

CrossValidatorModel

CrossValidatorModel contains the model with the highest average cross-validation metric across folds and uses this model to transform input data.

CrossValidatorModel.CrossValidatorModelWriter

Writer for CrossValidatorModel.

CrossValidatorParams

Params for CrossValidator and CrossValidatorModel.

CryptoStreamUtils

A util class for manipulating IO encryption and decryption streams.

CustomAvgMetric

Built-in `CustomMetric` that computes average of metric values.

CustomMetric

A custom metric.

CustomSumMetric

Built-in `CustomMetric` that sums up metric values.

CustomTaskMetric

A custom task metric.

DAGSchedulerEvent

Types of events that can be handled by the DAGScheduler.

Database

A database in Spark, as returned by the listDatabases method defined in Catalog.

DatabricksDialect

DataflowGraph

DataflowGraph represents the core graph structure for Spark declarative pipelines.

DataflowGraphTransformer

Resolves the DataflowGraph by processing each node in the graph.

DataflowGraphTransformer.TransformNodeFailedException

Exception thrown when transforming a node in the graph fails with a non-retryable error.

DataflowGraphTransformer.TransformNodeFailedException$

DataflowGraphTransformer.TransformNodeRetryableException

Exception thrown when transforming a node in the graph fails because at least one of its dependencies weren't yet transformed.

DataflowGraphTransformer.TransformNodeRetryableException$

DataFrameNaFunctions

Functionality for working with missing data in DataFrames.

DataFrameReader

Interface used to load a Dataset from external storage systems (e.g.

DataFrameStatFunctions

Statistic functions for DataFrames.

DataFrameWriter<T>

Interface used to write a Dataset to external storage systems (e.g.

DataFrameWriterV2<T>

Interface used to write a Dataset to external storage using the v2 API.

Dataset<T>

A Dataset is a strongly typed collection of domain-specific objects that can be transformed in parallel using functional or relational operations.

Dataset

A type of Output that represents a materialized dataset in a DataflowGraph.

DatasetHolder<T>

A container for a Dataset, used for implicit conversions in Scala.

DatasetManager

DatasetManager is responsible for materializing tables in the catalog based on the given graph.

DatasetManager.TableMaterializationException

Wraps table materialization exceptions.

DatasetManager.TableMaterializationException$

DatasetType

DatasetType.MATERIALIZED_VIEW$

DatasetType.STREAMING_TABLE$

DatasetUtils

DataSourceRegister

Data sources should implement this trait so that they can register an alias to their data source.

DataStreamReader

Interface used to load a streaming Dataset from external storage systems (e.g.

DataStreamWriter<T>

Interface used to write a streaming Dataset to external storage systems (e.g.

DataType

The base type of all Spark SQL data types.

DataTypes

To get/create specific data type, users should use singleton objects and factory methods provided by this class.

DataValidators

A collection of methods used to validate data before applying ML algorithms.

DataWriter<T>

A data writer returned by DataWriterFactory.createWriter(int, long) and is responsible for writing data for an input RDD partition.

DataWriterFactory

A factory of DataWriter returned by BatchWrite.createBatchWriterFactory(PhysicalWriteInfo), which is responsible for creating and initializing the actual data writer at executor side.

DateType

The date type represents a valid date in the proleptic Gregorian calendar.

DateTypeExpression

DayTimeIntervalType

The type represents day-time intervals of the SQL standard.

DB2Dialect

DCT

A feature transformer that takes the 1D discrete cosine transform of a real vector.

Decimal

A mutable implementation of BigDecimal that can hold a Long if values are small enough.

Decimal.DecimalAsIfIntegral$

A Integral evidence parameter for Decimals.

Decimal.DecimalIsConflicted

Common methods for Decimal evidence parameters

Decimal.DecimalIsFractional$

A Fractional evidence parameter for Decimals.

DecimalExactNumeric

DecimalExpression

DecimalType

The data type representing java.math.BigDecimal values.

DecimalType.Fixed$

DecisionTree

A class which implements a decision tree learning algorithm for classification and regression.

DecisionTreeClassificationModel

Decision tree model (http://en.wikipedia.org/wiki/Decision_tree_learning) for classification.

DecisionTreeClassifier

Decision tree learning algorithm (http://en.wikipedia.org/wiki/Decision_tree_learning) for classification.

DecisionTreeClassifierParams

DecisionTreeModel

Abstraction for Decision Tree models.

DecisionTreeModel

Decision tree model for classification or regression.

DecisionTreeModel.SaveLoadV1_0$

DecisionTreeModelReadWrite

Helper classes for tree model persistence

DecisionTreeModelReadWrite.NodeData

Info for a Node

DecisionTreeModelReadWrite.NodeData$

DecisionTreeModelReadWrite.SplitData

Info for a Split

DecisionTreeModelReadWrite.SplitData$

DecisionTreeParams

Parameters for Decision Tree-based algorithms.

DecisionTreeRegressionModel

Decision tree (Wikipedia) model for regression.

DecisionTreeRegressor

Decision tree learning algorithm for regression.

DecisionTreeRegressorParams

DefaultCredentials

Returns DefaultAWSCredentialsProviderChain for authentication.

DefaultParamsReadable<T>

Helper trait for making simple Params types readable.

DefaultParamsWritable

Helper trait for making simple Params types writable.

DefaultPartitionCoalescer

Coalesce the partitions of a parent RDD (prev) into fewer partitions, so that each partition of this RDD computes one or more of the parent ones.

DefaultTopologyMapper

A TopologyMapper that assumes all nodes are in the same rack

DefaultValue

A class that represents default values.

DelegatingCatalogExtension

A simple implementation of CatalogExtension, which implements all the catalog functions by calling the built-in session catalog directly.

DeltaBatchWrite

An interface that defines how to write a delta of rows during batch processing.

DeltaWrite

A logical representation of a data source write that handles a delta of rows.

DeltaWriteBuilder

An interface for building a DeltaWrite.

DeltaWriter<T>

A data writer returned by DeltaWriterFactory.createWriter(int, long) and is responsible for writing a delta of rows.

DeltaWriterFactory

A factory for creating DeltaWriters returned by DeltaBatchWrite.createBatchWriterFactory(PhysicalWriteInfo), which is responsible for creating and initializing writers at the executor side.

DenseMatrix

Column-major dense matrix.

DenseMatrix

Column-major dense matrix.

DenseVector

A dense vector represented by a value array.

DenseVector

A dense vector represented by a value array.

Dependency<T>

:: DeveloperApi :: Base class for dependencies.

DependencyUtils

DerbyDialect

DeserializationStream

:: DeveloperApi :: A stream for reading serialized objects.

DeserializedMemoryEntry<T>

DeserializedValuesHolder<T>

A holder for storing the deserialized values.

DeterministicLevel

The deterministic level of RDD's output (i.e.

DeterministicLevelSerializer

DifferentiableLossAggregator<Datum,Agg extends DifferentiableLossAggregator<Datum,Agg>>

A parent trait for aggregators used in fitting MLlib models.

DifferentiableRegularization<T>

A Breeze diff function which represents a cost function for differentiable regularization of parameters.

DirectPoolMemory

DiskBlockData

DistributedLDAModel

Distributed model fitted by LDA.

DistributedLDAModel

Distributed LDA model.

DistributedMatrix

Represents a distributively stored matrix backed by one or more RDDs.

Distribution

An interface that defines how data is distributed across partitions.

Distributions

Helper methods to create distributions to pass into Spark.

Dot

DoubleAccumulator

An accumulator for computing sum, count, and averages for double precision floating numbers.

DoubleAccumulatorSource

DoubleArrayArrayParam

Specialized version of Param[Array[Array[Double}] for Java.

DoubleArrayParam

Specialized version of Param[Array[Double} for Java.

DoubleExactNumeric

DoubleFlatMapFunction<T>

A function that returns zero or more records of type Double from each input record.

DoubleFunction<T>

A function that returns Doubles, and can be used to construct DoubleRDDs.

DoubleParam

Specialized version of Param[Double] for Java.

DoubleRDDFunctions

Extra functions available on RDDs of Doubles through an implicit conversion.

DoubleType

The data type representing Double values.

DoubleType.DoubleAsIfIntegral

DoubleType.DoubleAsIfIntegral$

DoubleType.DoubleIsConflicted

DoubleTypeExpression

DriverPlugin

:: DeveloperApi :: Driver component of a SparkPlugin.

DStream<T>

A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (see org.apache.spark.rdd.RDD in the Spark core documentation for more details on RDDs).

DummyInvocationHandler

DummySerializerInstance

Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter.

Duration

Durations

Edge<ED>

A single directed edge consisting of a source id, target id, and the data associated with the edge.

EdgeActiveness

Criteria for filtering edges based on activeness.

EdgeContext<VD,ED,A>

Represents an edge along with its neighboring vertices and allows sending messages along the edge.

EdgeDirection

The direction of a directed edge relative to a vertex.

EdgeInterpolationAlgorithm

Edge interpolation algorithm for Geography logical type.

EdgeInterpolationAlgorithm.SPHERICAL$

EdgeRDD<ED>

EdgeRDD[ED, VD] extends RDD[Edge[ED} by storing the edges in columnar format on each partition for performance.

EdgeRDDImpl<ED,VD>

EdgeTriplet<VD,ED>

An edge triplet represents an edge along with the vertex attributes of its neighboring vertices.

EigenValueDecomposition

Compute eigen-decomposition.

ElementwiseProduct

Outputs the Hadamard product (i.e., the element-wise product) of each input vector with a provided "weight" vector.

ElementwiseProduct

Outputs the Hadamard product (i.e., the element-wise product) of each input vector with a provided "weight" vector.

EMLDAOptimizer

Optimizer for EM algorithm which stores data + parameter graph, plus algorithm parameters.

EmptyTerm

Placeholder term for the result of undefined interactions, e.g.

Encoder<T>

Used to convert a JVM object of type T to and from the internal Spark SQL representation.

EncoderImplicits

EncoderImplicits used to implicitly generate SQL Encoders.

Encoders

Methods for creating an Encoder.

EnsembleCombiningStrategy

Enum to select ensemble combining strategy for base learners

EnsembleModelReadWrite

EnsembleModelReadWrite.EnsembleNodeData

Info for one Node in a tree ensemble

EnsembleModelReadWrite.EnsembleNodeData$

Entropy

Class for calculating entropy during multiclass classification.

EnumUtil

EqualNullSafe

Performs equality comparison, similar to EqualTo.

EqualTo

A filter that evaluates to true iff the column evaluates to a value equal to value.

ErrorClassesJsonReader

A reader to load error information from one or more JSON files.

ErrorInfo

Information associated with an error class.

ErrorMessageFormat

ErrorStateInfo

Information associated with an error state / SQLSTATE.

ErrorSubInfo

Information associated with an error subclass.

Estimator<M extends Model<M>>

Abstract class for estimators that fit models to data.

EstimatorUtils

Evaluator

Abstract class for evaluators that compute metrics from predictions.

EventDetails

EventHelpers

Contains helpers and implicits for working with PipelineEvents.

:: DeveloperApi :: Task failed due to a runtime exception.

ExcludedExecutor

ExecutionData

ExecutionListenerManager

Manager for QueryExecutionListener.

ExecutionResult

A flow's execution may complete for two reasons: 1.

ExecutionResult.FINISHED$

ExecutionResult.STOPPED$

ExecutorInfo

:: DeveloperApi :: Stores information about an executor to pass from the scheduler to SparkListeners.

ExecutorKilled

ExecutorLossMessage

ExecutorLostFailure

:: DeveloperApi :: The task failed because the executor that it was running on was lost.

ExecutorMetricsDistributions

ExecutorMetricsSerializer

ExecutorMetricType

Executor metric types for executor-level metrics stored in ExecutorMetrics.

ExecutorPeakMetricsDistributions

ExecutorPlugin

:: DeveloperApi :: Executor component of a SparkPlugin.

ExecutorRegistered

ExecutorRemoved

ExecutorResourceRequest

An Executor resource request.

ExecutorResourceRequests

A set of Executor resource requests.

ExecutorStageSummary

ExecutorStageSummarySerializer

ExecutorStreamSummary

ExecutorSummary

ExpectationAggregator

ExpectationAggregator computes the partial expectation results.

ExpectationSum

ExperimentalMethods

:: Experimental :: Holder for experimental methods for the bravest.

ExpireDeadHosts

ExpiredTimerInfo

Class used to provide access to expired timer's expiry time.

ExponentialBackoffStrategy

A BackoffStrategy where the back-off time grows exponentially for each successive retry.

ExponentialGenerator

Generates i.i.d.

ExposedBufferByteArrayOutputStream

Subclass of ByteArrayOutputStream that exposes `buf` directly.

Expression

Base class of the public logical expression API.

Expressions

Helper methods to create logical transforms to pass into Spark.

ExtendedExplainGenerator

A trait for a session extension to implement that provides addition explain plan information.

ExternalClusterManager

A cluster manager interface to plugin external scheduler.

ExternalCommandRunner

An interface to execute an arbitrary string command inside an external execution engine rather than Spark.

Extract

Represent an extract function, which extracts and returns the value of a specified datetime field from a datetime or interval value expression.

ExtractableLiteral

FactorizationMachines

FactorizationMachinesParams

Params for Factorization Machines

FailureStoppingFlow

Indicates that there was a failure while stopping the flow.

FailureStoppingOperation

Abstract class used to identify failures related to failures stopping an operation/timeouts.

FalsePositiveRate

False positive rate.

FeatureHasher

Feature hashing projects a set of categorical or numerical features into a feature vector of specified dimension (typically substantially smaller than that of the original feature space).

FeatureType

Enum to describe whether a feature is "continuous" or "categorical"

FetchFailed

:: DeveloperApi :: Task failed to fetch shuffle data from a remote node.

FileBasedTopologyMapper

A simple file based topology mapper.

Filter

A filter predicate for data sources.

FilterFunction<T>

Base interface for a function used in Dataset's filter function.

FitEnd<M extends Model<M>>

Event fired after Estimator.fit.

FitStart<M extends Model<M>>

Event fired before Estimator.fit.

FixedLength

FlatMapFunction<T,R>

A function that returns zero or more output records from each input record.

FlatMapFunction2<T1,T2,R>

A function that takes two inputs and returns zero or more output records.

FlatMapGroupsFunction<K,V,R>

A function that returns zero or more output records from each grouping key and its values.

FlatMapGroupsWithStateFunction<K,V,S,R>

::Experimental:: Base interface for a map function used in

org.apache.spark.sql.KeyValueGroupedDataset.flatMapGroupsWithState(
 FlatMapGroupsWithStateFunction, org.apache.spark.sql.streaming.OutputMode,
 org.apache.spark.sql.Encoder, org.apache.spark.sql.Encoder)

FloatExactNumeric

FloatParam

Specialized version of Param[Float] for Java.

FloatType

The data type representing Float values.

FloatType.FloatAsIfIntegral

FloatType.FloatAsIfIntegral$

FloatType.FloatIsConflicted

FloatTypeExpression

Flow

A Flow is a node of data transformation in a dataflow graph.

FlowAnalysis

FlowExecution

A `FlowExecution` specifies how to execute a flow and manages its execution.

FlowFilter

Specifies how we should filter Flows.

FlowFunction

A wrapper for the lambda function that defines a Flow.

FlowFunctionResult

Holds the DataFrame returned by a FlowFunction along with the inputs used to construct it.

FlowNode

param: identifier The identifier of the flow.

FlowPlanner

Plans execution of Flows in a DataflowGraph by converting Flows into 'FlowExecution's.

FlowProgress

FlowProgressEventLogger

This class should be used for all flow progress events logging, it controls the level at which events are logged.

FlowResolver

FlowsForTables

Used in partial graph updates to select flows that flow to "selectedTables".

FlowStatus

FlowStatus.COMPLETED$

Represents the system metadata associated with a Flow.

FMClassificationModel

Model produced by FMClassifier

FMClassificationModel.Data$

FMClassificationSummary

Abstraction for FMClassifier results for a given model.

FMClassificationSummaryImpl

FMClassifier results for a given model.

FMClassificationTrainingSummary

Abstraction for FMClassifier training results.

FMClassificationTrainingSummaryImpl

FMClassifier training results.

FMClassifier

Factorization Machines learning algorithm for classification.

FMClassifierParams

Params for FMClassifier.

FMRegressionModel

Model produced by FMRegressor.

FMRegressionModel.Data$

FMRegressor

Factorization Machines learning algorithm for regression.

FMRegressorParams

Params for FMRegressor

ForeachFunction<T>

Base interface for a function used in Dataset's foreach function.

ForeachPartitionFunction<T>

Base interface for a function used in Dataset's foreachPartition function.

ForeachWriter<T>

The abstract class for writing custom logic to process data generated by a query.

ForeignKey

A FOREIGN KEY constraint.

ForeignKey.Builder

FPGrowth

A parallel FP-growth algorithm to mine frequent itemsets.

FPGrowth

A parallel FP-growth algorithm to mine frequent itemsets.

FPGrowth.FreqItemset<Item>

Frequent itemset.

FPGrowthModel

Model fitted by FPGrowth.

FPGrowthModel<Item>

Model trained by FPGrowth, which holds frequent itemsets.

FPGrowthModel.SaveLoadV1_0$

FPGrowthParams

Common params for FPGrowth and FPGrowthModel

Function<T1,R>

Base interface for functions whose return types do not create special RDDs.

Function

A user-defined function in Spark, as returned by listFunctions method in Catalog.

Function

Base class for user-defined functions.

Function0<R>

A zero-argument function that returns an R.

Function2<T1,T2,R>

A two-argument function that takes arguments of type T1 and T2 and returns an R.

Function3<T1,T2,T3,R>

A three-argument function that takes arguments of type T1, T2 and T3 and returns an R.

Function4<T1,T2,T3,T4,R>

A four-argument function that takes arguments of type T1, T2, T3 and T4 and returns an R.

FunctionCatalog

Catalog methods for working with Functions.

functions

Commonly used functions available for DataFrame operations.

functions

functions.partitioning$

FutureAction<T>

A future for the result of an action to support cancellation.

FValueTest

FValue test for continuous data.

GammaGenerator

Generates i.i.d.

GarbageCollectionMetrics

GaussianMixture

Gaussian Mixture clustering.

GaussianMixture

This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs).

GaussianMixtureModel

Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points are drawn from each Gaussian i with probability weights(i).

GaussianMixtureModel

Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points are drawn from each Gaussian i=1..k with probability w(i); mu(i) and sigma(i) are the respective mean and covariance for each Gaussian distribution i=1..k.

GaussianMixtureModel.Data$

GaussianMixtureParams

Common params for GaussianMixture and GaussianMixtureModel

GaussianMixtureSummary

Summary of GaussianMixture.

GBTClassificationModel

Gradient-Boosted Trees (GBTs) (http://en.wikipedia.org/wiki/Gradient_boosting) model for classification.

GBTClassifier

Gradient-Boosted Trees (GBTs) (http://en.wikipedia.org/wiki/Gradient_boosting) learning algorithm for classification.

GBTClassifierParams

GBTParams

Parameters for Gradient-Boosted Tree algorithms.

GBTRegressionModel

Gradient-Boosted Trees (GBTs) model for regression.

GBTRegressor

Gradient-Boosted Trees (GBTs) learning algorithm for regression.

GBTRegressorParams

GeneralAggregateFunc

The general implementation of AggregateFunc, which contains the upper-cased function name, the `isDistinct` flag and all the inputs.

GeneralizedLinearAlgorithm<M extends GeneralizedLinearModel>

GeneralizedLinearAlgorithm implements methods to train a Generalized Linear Model (GLM).

GeneralizedLinearModel

GeneralizedLinearModel (GLM) represents a model trained using GeneralizedLinearAlgorithm.

GeneralizedLinearRegression

Fit a Generalized Linear Model (see Generalized linear model (Wikipedia)) specified by giving a symbolic description of the linear predictor (link function) and a description of the error distribution (family).

GeneralizedLinearRegression.Binomial$

Binomial exponential family distribution.

GeneralizedLinearRegression.CLogLog$

GeneralizedLinearRegression.Family$

GeneralizedLinearRegression.FamilyAndLink$

GeneralizedLinearRegression.Gamma$

Gamma exponential family distribution.

GeneralizedLinearRegression.Gaussian$

Gaussian exponential family distribution.

GeneralizedLinearRegression.Identity$

GeneralizedLinearRegression.Inverse$

GeneralizedLinearRegression.Link$

GeneralizedLinearRegression.Log$

GeneralizedLinearRegression.Logit$

GeneralizedLinearRegression.Poisson$

Poisson exponential family distribution.

GeneralizedLinearRegression.Probit$

GeneralizedLinearRegression.Sqrt$

GeneralizedLinearRegression.Tweedie$

GeneralizedLinearRegressionBase

Params for Generalized Linear Regression.

GeneralizedLinearRegressionModel

Model produced by GeneralizedLinearRegression.

GeneralizedLinearRegressionModel.Data$

GeneralizedLinearRegressionSummary

Summary of GeneralizedLinearRegression model and predictions.

GeneralizedLinearRegressionTrainingSummary

Summary of GeneralizedLinearRegression fitting and model.

GeneralMLWritable

Trait for classes that provide GeneralMLWriter.

GeneralMLWriter

A ML Writer which delegates based on the requested format.

GeneralScalarExpression

The general representation of SQL scalar expressions, which contains the upper-cased expression name and all the children expressions.

GeographyType

The data type representing GEOGRAPHY values which are spatial objects, as defined in the Open Geospatial Consortium (OGC) Simple Feature Access specification (https://portal.ogc.org/files/?artifact_id=25355), with a geographic coordinate system.

GeometryType

The data type representing GEOMETRY values which are spatial objects, as defined in the Open Geospatial Consortium (OGC) Simple Feature Access specification (https://portal.ogc.org/files/?artifact_id=25355), with a Cartesian coordinate system.

GetAllReceiverInfo

Gini

Class for calculating the Gini impurity (http://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity) during multiclass classification.

GLMClassificationModel

Helper class for import/export of GLM classification models.

GLMClassificationModel.SaveLoadV1_0$

GLMRegressionModel

Helper methods for import/export of GLM regression models.

GLMRegressionModel.SaveLoadV1_0$

Gradient

Class used to compute the gradient for a loss function, given a single data point.

GradientBoostedTrees

A class that implements Stochastic Gradient Boosting for regression and binary classification.

GradientBoostedTreesModel

Represents a gradient boosted trees model.

GradientDescent

Class used to solve an optimization problem using Gradient Descent.

Graph<VD,ED>

The Graph abstractly represents a graph with arbitrary objects associated with vertices and edges.

GraphElement

An element in a DataflowGraph.

GraphElementTypeUtils

GraphErrors

Collection of errors that can be thrown during graph resolution / analysis.

GraphExecution

GraphExecution.FlowExecutionAction

GraphExecution.FlowExecutionStopReason

Represents the reason why a flow execution should be stopped.

GraphExecution.RetryFlowExecution$

Indicates that the flow execution should be retried.

GraphExecution.StopFlowExecution

Indicates that the flow execution should be stopped with a specific reason.

GraphExecution.StopFlowExecution$

GraphFilter<E>

Specifies how we should filter Graph elements.

GraphGenerators

A collection of graph generating functions.

GraphIdentifierManager

Responsible for properly qualify the identifiers for datasets inside or referenced by the dataflow graph.

GraphIdentifierManager.DatasetIdentifier

Represents the identifier for a dataset that is defined or referenced in a pipeline.

GraphIdentifierManager.ExternalDatasetIdentifier

Represents the identifier for a dataset that is external to the current pipeline.

GraphIdentifierManager.ExternalDatasetIdentifier$

GraphIdentifierManager.InternalDatasetIdentifier

Represents the identifier for a dataset that is defined by the current pipeline.

GraphIdentifierManager.InternalDatasetIdentifier$

GraphImpl<VD,ED>

An implementation of Graph to support computation on graphs.

GraphLoader

Provides utilities for loading Graphs from files.

GraphOperations

GraphOps<VD,ED>

Contains additional functionality for Graph.

GraphRegistrationContext

A mutable context for registering tables, views, and flows in a dataflow graph.

GraphRegistrationContext.OutputType

GraphValidations

Validations performed on a `DataflowGraph`.

GraphXUtils

GreaterThan

A filter that evaluates to true iff the attribute evaluates to a value greater than value.

GreaterThanOrEqual

A filter that evaluates to true iff the attribute evaluates to a value greater than or equal to value.

GroupMappingServiceProvider

This Spark trait is used for mapping a given userName to a set of groups which it belongs to.

GroupState<S>

:: Experimental ::

GroupStateTimeout

Represents the type of timeouts possible for the Dataset operations mapGroupsWithState and flatMapGroupsWithState.

HadoopCodecStreams

An utility object to look up Hadoop compression codecs and create input streams.

HadoopDelegationTokenProvider

::DeveloperApi:: Hadoop delegation token provider.

HadoopFSUtils

Utility functions to simplify and speed-up file listing.

HadoopRDD<K,V>

:: DeveloperApi :: An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS, sources in HBase, or S3), using the older MapReduce API (org.apache.hadoop.mapred).

HadoopRDD.HadoopMapPartitionsWithSplitRDD$

HasAggregationDepth

Trait for shared param aggregationDepth (default: 2).

HasBlockSize

Trait for shared param blockSize.

HasCheckpointInterval

Trait for shared param checkpointInterval.

HasCollectSubModels

Trait for shared param collectSubModels (default: false).

HasDistanceMeasure

Trait for shared param distanceMeasure (default: "euclidean").

HasElasticNetParam

Trait for shared param elasticNetParam.

HasFeaturesCol

Trait for shared param featuresCol (default: "features").

HasFitIntercept

Trait for shared param fitIntercept (default: true).

HasHandleInvalid

Trait for shared param handleInvalid.

HashingTF

Maps a sequence of terms to their term frequencies using the hashing trick.

HashingTF

Maps a sequence of terms to their term frequencies using the hashing trick.

HashPartitioner

A Partitioner that implements hash-based partitioning using Java's Object.hashCode.

HasInputCol

Trait for shared param inputCol.

HasInputCols

Trait for shared param inputCols.

HasLabelCol

Trait for shared param labelCol (default: "label").

HasLoss

Trait for shared param loss.

HasMaxBlockSizeInMB

Trait for shared param maxBlockSizeInMB (default: 0.0).

HasMaxIter

Trait for shared param maxIter.

HasNumFeatures

Trait for shared param numFeatures (default: 262144).

HasOutputCol

Trait for shared param outputCol (default: uid + "__output").

HasOutputCols

Trait for shared param outputCols.

HasParallelism

Trait to define a level of parallelism for algorithms that are able to use multithreaded execution, and provide a thread-pool based execution context.

HasPartitionKey

A mix-in for input partitions whose records are clustered on the same set of partition keys (provided via SupportsReportPartitioning, see below).

HasPartitionStatistics

A mix-in for input partitions whose records are clustered on the same set of partition keys (provided via SupportsReportPartitioning, see below).

HasPredictionCol

Trait for shared param predictionCol (default: "prediction").

HasProbabilityCol

Trait for shared param probabilityCol (default: "probability").

HasRawPredictionCol

Trait for shared param rawPredictionCol (default: "rawPrediction").

HasRegParam

Trait for shared param regParam.

HasRelativeError

Trait for shared param relativeError (default: 0.001).

HasSeed

Trait for shared param seed (default: this.getClass.getName.hashCode.toLong).

HasSolver

Trait for shared param solver.

HasStandardization

Trait for shared param standardization (default: true).

HasStepSize

Trait for shared param stepSize.

HasThreshold

Trait for shared param threshold.

HasThresholds

Trait for shared param thresholds.

HasTol

Trait for shared param tol.

HasTrainingSummary<T>

Trait for models that provides Training summary.

HasValidationIndicatorCol

Trait for shared param validationIndicatorCol.

HasVarianceCol

Trait for shared param varianceCol.

HasVarianceImpurity

HasWeightCol

Trait for shared param weightCol.

HdfsUtils

HingeGradient

Compute gradient and loss for a Hinge loss function, as used in SVM binary classification.

Histogram

An interface to represent an equi-height histogram, which is a part of ColumnStatistics.

HistogramBin

An interface to represent a bin in an equi-height histogram.

HiveCatalogMetrics

Metrics for access to the hive external catalog.

HttpSecurityFilter

A servlet filter that implements HTTP security features.

Identifiable

Trait for an object with an immutable unique ID that identifies itself and its derivatives.

Identifier

Identifies an object in a catalog.

IdentifierHelper

IdentityColumnSpec

Identity column specification.

IDF

Compute the Inverse Document Frequency (IDF) given a collection of documents.

IDF

Inverse document frequency (IDF).

IDF.DocumentFrequencyAggregator

Document frequency aggregator.

IDFBase

Params for IDF and IDFModel.

IDFModel

Model fitted by IDF.

IDFModel

Represents an IDF model that can transform term frequency vectors.

IDFModel.Data$

ImageDataSource

image package implements Spark SQL data source API for loading image data as DataFrame.

ImageSchema

Defines the image schema and methods to read and manipulate images.

Impurities

Factory for Impurity instances.

Impurity

Trait for calculating information gain.

Imputer

Imputation estimator for completing missing values, using the mean, median or mode of the columns in which the missing values are located.

ImputerModel

Model fitted by Imputer.

ImputerParams

Params for Imputer and ImputerModel.

A filter that evaluates to true iff the attribute evaluates to one of the values in the array.

IncompatibleMergeException

IndeterminateStringType

String type that was the result of coercing two different non-explicit collations.

IndexedRow

Represents a row of IndexedRowMatrix.

IndexedRowMatrix

Represents a row-oriented DistributedMatrix with indexed rows.

IndexToString

A Transformer that maps a column of indices back to a new column of corresponding string values.

IndylambdaScalaClosures

InformationGainStats

Information gain statistics for each split param: gain information gain value param: impurity current node impurity param: leftImpurity left node impurity param: rightImpurity right node impurity param: leftPredict left node predict param: rightPredict right node predict

InnerClosureFinder

InProcessLauncher

In-process launcher for Spark applications.

Input

Specifies an input that can be referenced by another Dataset's query.

InputDStream<T>

This is the abstract base class for all input streams.

InputFileBlockHolder

This holds file names of the current Spark task.

InputFormatInfo

:: DeveloperApi :: Parses and holds information about inputFormat (and files) specified as a parameter.

InputMetricDistributions

InputMetrics

InputPartition

A serializable representation of an input partition returned by Batch.planInputPartitions() and the corresponding ones in streaming .

InputReadOptions

Generic options for a read of an input.

InsertableRelation

A BaseRelation that can be used to insert data into it through the insert method.

IntArrayParam

Specialized version of Param[Array[Int} for Java.

IntegerExactNumeric

IntegerType

The data type representing Int values.

IntegerTypeExpression

IntegralTypeExpression

InteractableTerm

A term that may be part of an interaction, e.g.

Interaction

Implements the feature interaction transform.

InternalAccumulator

A collection of fields and methods concerned with internal accumulators that represent task level metrics.

InternalAccumulator.input$

InternalAccumulator.output$

InternalAccumulator.shuffleRead$

InternalAccumulator.shuffleWrite$

InternalKMeansModelWriter

A writer for KMeans that handles the "internal" (or default) format

InternalLinearRegressionModelWriter

A writer for LinearRegression that handles the "internal" (or default) format

InternalNode

Internal Decision Tree node.

InterruptibleIterator<T>

:: DeveloperApi :: An iterator that wraps around an existing iterator to provide task killing functionality.

IntParam

Specialized version of Param[Int] for Java.

IntParam

An extractor object for parsing strings into integers.

IsNotNull

A filter that evaluates to true iff the attribute evaluates to a non-null value.

IsNull

A filter that evaluates to true iff the attribute evaluates to null.

IsotonicRegression

Isotonic regression.

IsotonicRegression

Isotonic regression.

IsotonicRegressionBase

Params for isotonic regression.

IsotonicRegressionModel

Model fitted by IsotonicRegression.

IsotonicRegressionModel

Regression model for isotonic regression.

IsotonicRegressionModel.Data$

JavaDoubleRDD

JavaDStream<T>

A Java-friendly interface to DStream, the basic abstraction in Spark Streaming that represents a continuous stream of data.

JavaDStreamLike<T,This extends JavaDStreamLike<T,This,R>,R extends JavaRDDLike<T,R>>

JavaFutureAction<T>

JavaHadoopRDD<K,V>

JavaInputDStream<T>

A Java-friendly interface to InputDStream.

JavaIterableWrapperSerializer

A Kryo serializer for serializing results returned by asJavaIterable.

JavaMapWithStateDStream<KeyType,ValueType,StateType,MappedType>

DStream representing the stream of data generated by mapWithState operation on a JavaPairDStream.

JavaModuleOptions

This helper class is used to place some JVM runtime options(eg: `--add-opens`) required by Spark when using Java 17.

JavaNewHadoopRDD<K,V>

JavaPackage

A dummy class as a workaround to show the package doc of spark.mllib in generated Java API docs.

JavaPairDStream<K,V>

A Java-friendly interface to a DStream of key-value pairs, which provides extra methods like reduceByKey and join.

JavaPairInputDStream<K,V>

A Java-friendly interface to InputDStream of key-value pairs.

JavaPairRDD<K,V>

JavaPairReceiverInputDStream<K,V>

A Java-friendly interface to ReceiverInputDStream, the abstract class for defining any input stream that receives data over the network.

JavaParams

Java-friendly wrapper for Params.

JavaRDD<T>

JavaRDDLike<T,This extends JavaRDDLike<T,This>>

Defines operations common to several Java RDD implementations.

JavaReceiverInputDStream<T>

A Java-friendly interface to ReceiverInputDStream, the abstract class for defining any input stream that receives data over the network.

JavaSerializer

:: DeveloperApi :: A Spark serializer that uses Java's built-in serialization.

JavaSparkContext

A Java-friendly version of SparkContext that returns JavaRDDs and works with Java collections instead of Scala ones.

JavaSparkStatusTracker

Low-level status reporting APIs for monitoring job and stage progress.

JavaStreamingContext

Deprecated.

This is deprecated as of Spark 3.4.0.

JavaStreamingListenerEvent

Base trait for events related to JavaStreamingListener

JavaUtils

JavaUtils.SerializableMapWrapper<A,B>

JdbcConnectionProvider

::DeveloperApi:: Connection provider which opens connection toward various databases (database specific instance needed).

JdbcDialect

:: DeveloperApi :: Encapsulates everything (extensions, workarounds, quirks) to handle the SQL dialect of a certain database or jdbc driver.

JdbcDialects

:: DeveloperApi :: Registry of dialects that apply to every new jdbc org.apache.spark.sql.DataFrame.

JdbcRDD<T>

Deprecated.

Jdbc RDD is deprecated, consider using JDBC data source instead.

JdbcRDD.ConnectionFactory

JdbcSQLQueryBuilder

The builder to build a single SELECT query.

JdbcType

:: DeveloperApi :: A database type definition coupled with the jdbc type needed to send null values to the database.

JettyUtils

Utilities for launching a web server using Jetty's HTTP Server class

JettyUtils.ServletParams<T>

JettyUtils.ServletParams$

JobData

JobDataUtil

JobExecutionStatus

JobExecutionStatusSerializer

JobFailed

JobGeneratorEvent

Event classes for JobGenerator

JobListener

Interface used to listen for job completion or failure events after submitting a job to the DAGScheduler.

JobResult

:: DeveloperApi :: A result of a job in the DAGScheduler.

JobSchedulerEvent

JobSubmitter

Handle via which a "run" function passed to a ComplexFutureAction can submit jobs for execution.

JobSucceeded

JoinPushdownAliasGenerator

JoinType

Enum representing the join type in public API.

JsonMatrixConverter

JsonProtocol

Serializes SparkListener events to/from JSON.

A servlet filter that requires JWS, a cryptographically signed JSON Web Token, in the header.

KernelDensity

Kernel density estimation.

KeyGroupedPartitioning

Represents a partitioning where rows are split across partitions based on the partition transform expressions returned by KeyGroupedPartitioning.keys.

KeyValueGroupedDataset<K,V>

A Dataset has been logically grouped by a user specified grouping key.

KillTask

KinesisInitialPositions

KinesisInitialPositions.AtTimestamp

KinesisInitialPositions.Latest

KinesisInitialPositions.TrimHorizon

KinesisUtilsPythonHelper

This is a helper class that wraps the methods in KinesisUtils into more Python-friendly class and function so that it can be easily instantiated and called from Python's KinesisUtils.

KMeans

K-means clustering with support for k-means|| initialization proposed by Bahmani et al.

KMeans

K-means clustering with a k-means++ like initialization mode (the k-means|| algorithm by Bahmani et al).

KMeansAggregator

KMeansAggregator computes the distances and updates the centers for blocks in sparse or dense matrix in an online fashion.

KMeansDataGenerator

Generate test data for KMeans.

KMeansModel

Model fitted by KMeans.

KMeansModel

A clustering model for K-means.

KMeansModel.Cluster$

KMeansModel.OldData$

KMeansModel.SaveLoadV1_0$

KMeansModel.SaveLoadV2_0$

KMeansParams

Common params for KMeans and KMeansModel

KMeansSummary

Summary of KMeans.

KnownSizeEstimation

A trait that allows a class to give SizeEstimator more accurate size estimation.

KolmogorovSmirnovTest

Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a continuous distribution.

KolmogorovSmirnovTest

Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a continuous distribution.

KolmogorovSmirnovTest.NullHypothesis$

KolmogorovSmirnovTestResult

Object containing the test results for the Kolmogorov-Smirnov test.

KryoRegistrator

Interface implemented by clients to register their classes with Kryo when using Kryo serialization.

KryoSerializer

A Spark serializer that uses the Kryo serialization library.

KVUtils

L1Updater

Updater for L1 regularized problems.

LabeledPoint

Class that represents the features and label of a data point.

LabeledPoint

Class that represents the features and labels of a data point.

LabelPropagation

Label Propagation algorithm.

LAPACK routines for MLlib's vectors and matrices.

LassoModel

Regression model trained using Lasso.

LassoWithSGD

Train a regression model with L1-regularization using Stochastic Gradient Descent.

Layer

Trait that holds Layer properties, that are needed to instantiate it.

LayerModel

Trait that holds Layer weights (or parameters).

LBFGS

Class used to solve an optimization problem using Limited-memory BFGS.

LDA

Latent Dirichlet Allocation (LDA), a topic model designed for text documents.

LDA

Latent Dirichlet Allocation (LDA), a topic model designed for text documents.

LDAModel

Model fitted by LDA.

LDAModel

Latent Dirichlet Allocation (LDA) model.

LDAOptimizer

An LDAOptimizer specifies which optimization/learning/inference algorithm to use, and it can hold optimizer-specific parameters for users to set.

LDAParams

LDAUtils

Utility methods for LDA.

LeafNode

Decision tree leaf node.

LeastSquaresGradient

Compute gradient and loss for a Least-squared loss function, as used in linear regression.

LessThan

A filter that evaluates to true iff the attribute evaluates to a value less than value.

LessThanOrEqual

A filter that evaluates to true iff the attribute evaluates to a value less than or equal to value.

LexicalThreadLocal<T>

Helper trait for defining thread locals with lexical scoping.

LexicalThreadLocal.Handle

Final class representing a handle to a thread local value.

LibSVMDataSource

libsvm package implements Spark SQL data source API for loading LIBSVM data as DataFrame.

LinearDataGenerator

Generate sample data used for Linear Data.

LinearRegression

Linear regression.

LinearRegressionModel

Model produced by LinearRegression.

LinearRegressionModel

Regression model trained using LinearRegression.

LinearRegressionParams

Params for linear regression.

LinearRegressionSummary

Linear regression results evaluated on a dataset.

LinearRegressionTrainingSummary

Linear regression training results.

LinearRegressionWithSGD

Train a linear regression model with no regularization using Stochastic Gradient Descent.

LinearSVC

Linear SVM Classifier

LinearSVCModel

Linear SVM Model trained by LinearSVC

LinearSVCModel.Data$

LinearSVCParams

Params for linear SVM Classifier.

LinearSVCSummary

Abstraction for LinearSVC results for a given model.

LinearSVCSummaryImpl

LinearSVC results for a given model.

LinearSVCTrainingSummary

Abstraction for LinearSVC training results.

LinearSVCTrainingSummaryImpl

LinearSVC training results.

ListenerBus<L,E>

An event bus which posts events to its listeners.

ListState<S>

Interface used for arbitrary stateful operations with the v2 API to capture list value state.

Lit

Convenience extractor for any Literal.

Literal<T>

Represents a constant literal value in the public expression API.

LiveEntityHelpers

LiveExecutorStageSummary

LiveJob

LiveRDD

Tracker for data related to a persisted RDD.

LiveRDDDistribution

LiveRDDPartition

Data about a single partition of a cached RDD.

LiveResourceProfile

LiveSpeculationStageSummary

LiveStage

LiveTask

Loader<M extends Saveable>

Trait for classes which can load models and transformers from files.

LoadInstanceEnd<T>

Event fired after MLReader.load.

LoadInstanceStart<T>

Event fired before MLReader.load.

LoadTableException

Exception raised when a flow fails to read from a table defined within the pipeline

LocalKMeans

An utility object to run K-means locally.

LocalLDAModel

Local (non-distributed) model fitted by LDA.

LocalLDAModel

Local LDA model.

LocalLDAModel.LocalModelData$

LocalScan

A special Scan which will happen on Driver locally instead of Executors.

LogBlockId

Identifies a block of log data.

LogBlockIdGenerator

LogBlockIdGenerator is responsible for generating unique LogBlockIds for log blocks.

LogBlockType

LogicalDistributions

LogicalExpressions

Helper methods for working with the logical expressions API.

LogicalWriteInfo

This interface contains logical write information that data sources can use when generating a WriteBuilder.

LogisticGradient

Compute gradient and loss for a multinomial logistic loss function, as used in multi-class classification (it is also used in binary logistic regression).

LogisticRegression

Logistic regression.

LogisticRegressionDataGenerator

Generate test data for LogisticRegression.

LogisticRegressionModel

Model produced by LogisticRegression.

LogisticRegressionModel

Classification model trained using Multinomial/Binary Logistic Regression.

LogisticRegressionModel.Data$

LogisticRegressionParams

Params for logistic regression.

LogisticRegressionSummary

Abstraction for logistic regression results for a given model.

LogisticRegressionSummaryImpl

Multiclass logistic regression results for a given model.

LogisticRegressionTrainingSummary

Abstraction for multiclass logistic regression training results.

LogisticRegressionTrainingSummaryImpl

Multiclass logistic regression training results.

LogisticRegressionWithLBFGS

Train a classification model for Multinomial/Binary Logistic Regression using Limited-memory BFGS.

LogisticRegressionWithSGD

Train a classification model for Binary Logistic Regression using Stochastic Gradient Descent.

LogLine

Base class representing a log line.

LogLoss

Class for log loss calculation (for classification).

LogNormalGenerator

Generates i.i.d.

LogUtils

:: : DeveloperApi :: Utils for querying Spark logs with Spark SQL.

LongAccumulator

An accumulator for computing sum, count, and average of 64-bit integers.

LongAccumulatorSource

LongExactNumeric

LongParam

Specialized version of Param[Long] for Java.

LongType

The data type representing Long values.

LongTypeExpression

LookupCatalog

A trait to encapsulate catalog lookup function and helpful extractors.

LookupCatalog.AsTableIdentifier

Extract legacy table identifier from a multi-part identifier.

LookupCatalog.AsTableIdentifier$

Extract legacy table identifier from a multi-part identifier.

LookupCatalog.CatalogAndIdentifier

Extract catalog and identifier from a multi-part name with the current catalog if needed.

LookupCatalog.CatalogAndIdentifier$

Extract catalog and identifier from a multi-part name with the current catalog if needed.

LookupCatalog.CatalogAndNamespace

Extract catalog and namespace from a multi-part name with the current catalog if needed.

LookupCatalog.CatalogAndNamespace$

Extract catalog and namespace from a multi-part name with the current catalog if needed.

LookupCatalog.NonSessionCatalogAndIdentifier

Extract non-session catalog and identifier from a multi-part identifier.

LookupCatalog.NonSessionCatalogAndIdentifier$

Extract non-session catalog and identifier from a multi-part identifier.

LookupCatalog.SessionCatalogAndIdentifier

Extract session catalog and identifier from a multi-part identifier.

LookupCatalog.SessionCatalogAndIdentifier$

Extract session catalog and identifier from a multi-part identifier.

Loss

Trait for adding "pluggable" loss functions for the gradient boosting algorithm.

Losses

LossFunction

Trait for loss function

LossReasonPending

A loss reason that means we don't yet know why the executor exited.

LowPrioritySQLImplicits

Lower priority implicit methods for converting Scala objects into Datasets.

LSHParams

Params for LSH.

LZ4CompressionCodec

:: DeveloperApi :: LZ4 implementation of CompressionCodec.

LZFCompressionCodec

:: DeveloperApi :: LZF implementation of CompressionCodec.

MapFunction<T,U>

Base interface for a map function used in Dataset's map function.

MapGroupsFunction<K,V,R>

Base interface for a map function used in GroupedDataset's mapGroup function.

MapGroupsWithStateFunction<K,V,S,R>

::Experimental:: Base interface for a map function used in

KeyValueGroupedDataset.mapGroupsWithState(MapGroupsWithStateFunction, org.apache.spark.sql.Encoder, org.apache.spark.sql.Encoder)

MapOutputCommitMessage

:: Private :: Represents the result of writing map outputs for a shuffle map task.

MapOutputMetadata

:: Private :: An opaque metadata tag for registering the result of committing the output of a shuffle map task.

MapOutputTrackerMasterMessage

MapOutputTrackerMessage

MapPartitionsFunction<T,U>

Base interface for function used in Dataset's mapPartitions.

MappedPoolMemory

MapperRowCounter

An AccumulatorV2 counter for collecting a list of (mapper index, row count).

MapState<K,V>

Interface used for arbitrary stateful operations with the v2 API to capture map value state.

MapStatus

Result returned by a ShuffleMapTask to a scheduler.

MapType

The data type for Maps.

MapWithStateDStream<KeyType,ValueType,StateType,MappedType>

DStream representing the stream of data generated by mapWithState operation on a pair DStream.

Matrices

Factory methods for Matrix.

Matrices

Factory methods for Matrix.

Matrix

Trait for a local matrix.

Matrix

Trait for a local matrix.

MatrixEntry

Represents an entry in a distributed matrix.

MatrixFactorizationModel

Model representing the result of matrix factorization.

MatrixFactorizationModel.SaveLoadV1_0$

MatrixImplicits

Implicit methods available in Scala for converting Matrix to Matrix and vice versa.

MavenUtils

Provides utility functions to be used inside SparkSubmit.

MavenUtils.MavenCoordinate$

Max

An aggregate function that returns the maximum value in a group.

MaxAbsScaler

Rescale each feature individually to range [-1, 1] by dividing through the largest maximum absolute value in each feature.

MaxAbsScalerModel

Model fitted by MaxAbsScaler.

MaxAbsScalerModel.Data$

MaxAbsScalerParams

Params for MaxAbsScaler and MaxAbsScalerModel.

MaxLength

MemoryEntry<T>

MemoryEntryBuilder<T>

MemoryMetrics

MemoryParam

An extractor object for parsing JVM memory strings, such as "10g", into an Int representing the number of megabytes.

MergeIntoWriter<T>

MergeIntoWriter provides methods to define and execute merge actions based on specified conditions.

MergeSummary

Provides an informational summary of the MERGE operation producing write.

MetaAlgorithmReadWrite

Default Meta-Algorithm read and write implementation.

Metadata

Metadata is a wrapper over Map[String, Any] that limits the value type to simple ones: Boolean, Long, Double, String, Metadata, Array[Boolean], Array[Long], Array[Double], Array[String], and Array[Metadata].

MetadataBuilder

Builder for Metadata.

MetadataColumn

Interface for a metadata column.

MetadataUtils

Helper utilities for algorithms using ML metadata

MethodIdentifier<T>

Helper class to identify a method.

Metric

MetricsSystemInstances

MetricUtils

MFDataGenerator

Generate RDD(s) containing data for Matrix Factorization.

MicroBatchStream

A SparkDataStream for streaming queries with micro-batch mode.

Milliseconds

Helper object that creates instance of Duration representing a given number of milliseconds.

Min

An aggregate function that returns the minimum value in a group.

MinHashLSH

LSH class for Jaccard distance.

MinHashLSHModel

Model produced by MinHashLSH, where multiple hash functions are stored.

MinHashLSHModel.Data$

MinMaxScaler

Rescale each feature individually to a common range [min, max] linearly using column summary statistics, which is also known as min-max normalization or Rescaling.

MinMaxScalerModel

Model fitted by MinMaxScaler.

MinMaxScalerModel.Data$

MinMaxScalerParams

Params for MinMaxScaler and MinMaxScalerModel.

Minutes

Helper object that creates instance of Duration representing a given number of minutes.

MiscellaneousProcessDetails

:: DeveloperApi :: Stores information about an Miscellaneous Process to pass from the scheduler to SparkListeners.

MitigationConfig

A spark config flag that can be used to mitigate a breaking change.

MLAllowListedLoader

MLEvent

Event emitted by ML operations.

MLEvents

A small trait that defines some methods to send MLEvent.

MLFormatRegister

ML export formats for should implement this trait so that users can specify a shortname rather than the fully qualified class name of the exporter.

MLPairRDDFunctions<K,V>

Machine learning specific Pair RDD functions.

MLReadable<T>

Trait for objects that provide MLReader.

MLReader<T>

Abstract class for utility classes that can load ML instances.

MLUtils

Helper methods to load, save and pre-process data used in MLLib.

MLWritable

Trait for classes that provide MLWriter.

MLWriter

Abstract class for utility classes that can save ML instances in Spark's internal format.

MLWriterFormat

Abstract class to be implemented by objects that provide ML exportability.

Model<M extends Model<M>>

A fitted model, i.e., a Transformer produced by an Estimator.

MsSqlServerDialect

MulticlassClassificationEvaluator

Evaluator for multiclass classification, which expects input columns: prediction, label, weight (optional) and probability (only for logLoss).

MulticlassMetrics

Evaluator for multiclass classification.

MultilabelClassificationEvaluator

:: Experimental :: Evaluator for multi-label classification, which expects two input columns: prediction and label.

MultilabelMetrics

Evaluator for multilabel classification.

MultilayerPerceptronClassificationModel

Classification model based on the Multilayer Perceptron.

MultilayerPerceptronClassificationModel.Data$

MultilayerPerceptronClassificationSummary

Abstraction for MultilayerPerceptronClassification results for a given model.

MultilayerPerceptronClassificationSummaryImpl

MultilayerPerceptronClassification results for a given model.

MultilayerPerceptronClassificationTrainingSummary

Abstraction for MultilayerPerceptronClassification training results.

MultilayerPerceptronClassificationTrainingSummaryImpl

MultilayerPerceptronClassification training results.

MultilayerPerceptronClassifier

Classifier trainer based on the Multilayer Perceptron.

MultilayerPerceptronParams

Params for Multilayer Perceptron.

MultivariateGaussian

This class provides basic functionality for a Multivariate Gaussian (Normal) Distribution.

MultivariateGaussian

This class provides basic functionality for a Multivariate Gaussian (Normal) Distribution.

MultivariateOnlineSummarizer

MultivariateOnlineSummarizer implements MultivariateStatisticalSummary to compute the mean, variance, minimum, maximum, counts, and nonzero counts for instances in sparse or dense vector format in an online fashion.

MultivariateStatisticalSummary

Trait for multivariate statistical summary of a data matrix.

MutableAggregationBuffer

A Row representing a mutable aggregation buffer.

MutablePair<T1,T2>

:: DeveloperApi :: A tuple of 2 elements.

MutableURLClassLoader

URL class loader that exposes the `addURL` method in URLClassLoader.

MySQLDialect

NaiveBayes

Naive Bayes Classifiers.

NaiveBayes

Trains a Naive Bayes model given an RDD of (label, features) pairs.

NaiveBayesModel

Model produced by NaiveBayes

NaiveBayesModel

Model for Naive Bayes Classifiers.

NaiveBayesModel.Data$

NaiveBayesModel.SaveLoadV1_0$

NaiveBayesModel.SaveLoadV2_0$

NaiveBayesParams

Params for Naive Bayes Classifiers.

NamedReference

Represents a field or column reference in the public logical expression API.

NamedTransform

Convenience extractor for any Transform.

NamespaceChange

NamespaceChange subclasses represent requested changes to a namespace.

NamespaceChange.RemoveProperty

A NamespaceChange to remove a namespace property.

NamespaceChange.SetProperty

A NamespaceChange to set a namespace property.

NarrowDependency<T>

:: DeveloperApi :: Base class for dependencies where each partition of the child RDD depends on a small number of partitions of the parent RDD.

NewHadoopRDD<K,V>

:: DeveloperApi :: An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS, sources in HBase, or S3), using the new MapReduce API (org.apache.hadoop.mapreduce).

NewHadoopRDD.NewHadoopMapPartitionsWithSplitRDD$

NGram

A feature transformer that converts the input array of strings into an array of n-grams.

NioBufferedFileInputStream

InputStream implementation which uses direct buffer to read a file to avoid extra copy of data between Java and native memory which happens when using BufferedInputStream.

NNLS

Object used to solve nonnegative least squares problems using a modified projected gradient method.

NNLS.Workspace

NoConstraint

Node

Decision tree node interface.

Node

Node in a decision tree.

Node

NoFlows

Used to specify that no flows should be refreshed.

NoLegacyJDBCError

Make the classifyException method throw out the original exception

NominalAttribute

A nominal attribute.

NoopDialect

NOOP dialect object, always returning the neutral element.

NormalEquationSolver

Interface for classes that solve the normal equations locally.

Normalizer

Normalize a vector to have unit norm using the given p-norm.

Normalizer

Normalizes samples individually to unit L^p^ norm

Not

A predicate that evaluates to true iff child is evaluated to false.

Not

A filter that evaluates to true iff child is evaluated to false.

NoTables

Used to select no tables.

NullOrdering

A null order used in sorting expressions.

NullType

The data type representing NULL values.

NumericAttribute

A numeric attribute with optional summary statistics.

NumericHistogram

A generic, re-usable histogram class that supports partial aggregations.

NumericHistogram.Coord

The Coord class defines a histogram bin, which is just an (x,y) pair.

NumericParser

Simple parser for a numeric structure consisting of three types:

NumericType

Numeric data types.

NumericTypeExpression

ObjectType

Observation

Helper class to simplify usage of Dataset.observe(String, Column, Column*):

OffHeapExecutionMemory

OffHeapStorageMemory

OffHeapUnifiedMemory

Offset

An abstract representation of progress through a MicroBatchStream or ContinuousStream.

OneHotEncoder

A one-hot encoder that maps a column of category indices to a column of binary vectors, with at most a single one-value per row that indicates the input category index.

OneHotEncoderBase

Private trait for params and common methods for OneHotEncoder and OneHotEncoderModel

OneHotEncoderCommon

Provides some helper methods used by OneHotEncoder.

OneHotEncoderModel

param: categorySizes Original number of categories for each feature being encoded.

OneHotEncoderModel.Data$

OneToOneDependency<T>

:: DeveloperApi :: Represents a one-to-one dependency between partitions of the parent and child RDDs.

OneVsRest

Reduction of Multiclass Classification to Binary Classification.

OneVsRestModel

Model produced by OneVsRest.

OneVsRestParams

Params for OneVsRest.

OnHeapExecutionMemory

OnHeapStorageMemory

OnHeapUnifiedMemory

OnlineLDAOptimizer

An online optimizer for LDA.

Optimizer

Trait for optimization problem solvers.

Optional<T>

Like java.util.Optional in Java 8, scala.Option in Scala, and com.google.common.base.Optional in Google Guava, this class represents a value of a given type that may or may not exist.

A predicate that evaluates to true iff at least one of left or right evaluates to true.

A filter that evaluates to true iff at least one of left or right evaluates to true.

OracleDialect

OrderedDistribution

A distribution where tuples have been ordered across partitions according to ordering expressions, but not necessarily within a given partition.

OrderedRDDFunctions<K,V,P extends scala.Product2<K,V>>

Extra functions available on RDDs of (key, value) pairs where the key is sortable through an implicit conversion.

Output

Represents a node in a DataflowGraph that can be written to by a Flow.

OutputCommitCoordinationMessage

OutputMetricDistributions

OutputMetrics

OutputMode

OutputMode describes what data will be written to a streaming sink when there is new data available in a streaming DataFrame/Dataset.

OutputOperationInfo

:: DeveloperApi :: Class having information on output operations.

PagedTable<T>

A paged table that will generate a HTML table for a specified page and also the page navigation.

PageRank

PageRank algorithm implementation.

Pair<L,R>

An immutable pair of values.

PairDStreamFunctions<K,V>

Extra functions available on DStream of (key, value) pairs through an implicit conversion.

PairFlatMapFunction<T,K,V>

A function that returns zero or more key-value pair records from each input record.

PairFunction<T,K,V>

A function that returns key-value pairs (Tuple2<K, V>), and can be used to construct PairRDDs.

PairRDDFunctions<K,V>

Extra functions available on RDDs of (key, value) pairs through an implicit conversion.

PairwiseRRDD<T>

Form an RDD[(Int, Array[Byte])] from key-value pairs returned from R.

Param<T>

A param with self-contained documentation and optionally default value.

ParamGridBuilder

Builder for a param grid used in grid search-based model selection.

ParamMap

A param to value map.

ParamPair<T>

A param and its value.

Params

Trait for components that take parameters.

ParamValidators

Factory methods for common validation functions for Param.isValid.

ParentClassLoader

A class loader which makes some protected methods in ClassLoader accessible.

PartialResult<R>

Partition

An identifier for a partition in an RDD.

PartitionCoalescer

::DeveloperApi:: A PartitionCoalescer defines how to coalesce the partitions of a given RDD.

Partitioner

An object that defines how the elements in a key-value pair RDD are partitioned by key.

PartitionEvaluator<T,U>

An evaluator for computing RDD partitions.

PartitionEvaluatorFactory<T,U>

A factory to create PartitionEvaluator.

PartitionGroup

::DeveloperApi:: A group of Partitions param: prefLoc preferred location for the partition group

PartitionHelper

Partitioning

An interface to represent the output data partitioning for a data source, which is returned by SupportsReportPartitioning.outputPartitioning().

PartitioningUtils

PartitionOffset

Used for per-partition offsets in continuous processing.

PartitionPruningRDD<T>

:: DeveloperApi :: An RDD used to prune RDD partitions/partitions so we can avoid launching tasks on all partitions.

PartitionReader<T>

A partition reader returned by PartitionReaderFactory.createReader(InputPartition) or PartitionReaderFactory.createColumnarReader(InputPartition).

PartitionReaderFactory

A factory used to create PartitionReader instances.

PartitionStrategy

Represents the way edges are assigned to edge partitions based on their source and destination vertex IDs.

PartitionStrategy.CanonicalRandomVertexCut$

Assigns edges to partitions by hashing the source and destination vertex IDs in a canonical direction, resulting in a random vertex cut that colocates all edges between two vertices, regardless of direction.

PartitionStrategy.EdgePartition1D$

Assigns edges to partitions using only the source vertex ID, colocating edges with the same source.

PartitionStrategy.EdgePartition2D$

Assigns edges to partitions using a 2D partitioning of the sparse edge adjacency matrix, guaranteeing a 2 * sqrt(numParts) bound on vertex replication.

PartitionStrategy.RandomVertexCut$

Assigns edges to partitions by hashing the source and destination vertex IDs, resulting in a random vertex cut that colocates all same-direction edges between two vertices.

PCA

PCA trains a model to project vectors to a lower dimensional space of the top PCA!.k principal components.

PCA

A feature transformer that projects vectors to a low-dimensional space using PCA.

PCAModel

Model fitted by PCA.

PCAModel

Model fitted by PCA that can project vectors to a low-dimensional space using PCA.

PCAModel.Data$

PCAParams

Params for PCA and PCAModel.

PCAUtil

PearsonCorrelation

Compute Pearson correlation for two RDDs of the type RDD[Double] or the correlation matrix for an RDD of the type RDD[Vector].

PersistedView

Representing a persisted View in a DataflowGraph.

PhysicalWriteInfo

This interface contains physical write information that data sources can use when generating a DataWriterFactory or a StreamingDataWriterFactory.

Pipeline

A simple pipeline, which acts as an estimator.

Pipeline.SharedReadWrite$

Methods for MLReader and MLWriter shared between Pipeline and PipelineModel

PipelineEvent

An internal event that is emitted during the run of a pipeline.

PipelineEventOrigin

Describes where the event originated from param: datasetName The name of the dataset param: flowName The name of the flow param: sourceCodeLocation The location of the source code

PipelineExecution

Executes a DataflowGraph by resolving the graph, materializing datasets, and running the flows.

PipelineModel

Represents a fitted pipeline.

PipelinesErrors

PipelinesTableProperties

Interface for validating and accessing Pipeline-specific table properties.

PipelineStage

A stage in a pipeline, either an Estimator or a Transformer.

PipelineTableProperty<T>

PipelineUpdateContext

PipelineUpdateContextImpl

An implementation of the PipelineUpdateContext trait used in production.

PluginContext

:: DeveloperApi :: Context information and operations for plugins loaded by Spark.

PMMLExportable

Export model to the PMML format Predictive Model Markup Language (PMML) is an XML-based file format developed by the Data Mining Group (www.dmg.org).

PMMLKMeansModelWriter

A writer for KMeans that handles the "pmml" format

PMMLLinearRegressionModelWriter

A writer for LinearRegression that handles the "pmml" format

PMMLModelExport

PMMLModelExportFactory

PoissonBounds

Utility functions that help us determine bounds on adjusted sampling rate to guarantee exact sample sizes with high confidence when sampling with replacement.

PoissonGenerator

Generates i.i.d.

PoissonSampler<T>

:: DeveloperApi :: A sampler for sampling with replacement, based on values drawn from Poisson distribution.

PolynomialExpansion

Perform feature expansion in a polynomial space.

PortableDataStream

A class that allows DataStreams to be serialized and moved around by not creating them until they need to be read

PostgresDialect

PowerIterationClustering

Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen.

PowerIterationClustering

Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen.

PowerIterationClustering.Assignment

Cluster assignment.

PowerIterationClustering.Assignment$

PowerIterationClusteringModel

Model produced by PowerIterationClustering.

PowerIterationClusteringModel.SaveLoadV1_0$

PowerIterationClusteringParams

Common params for PowerIterationClustering

PowerIterationClusteringWrapper

Precision

Precision.

Predicate

The general representation of predicate expressions, which contains the upper-cased expression name and all the children expressions.

Predict

Predicted value for a node param: predict predicted value param: prob probability of the label (classification only)

PredictionModel<FeaturesType,M extends PredictionModel<FeaturesType,M>>

Abstraction for a model for prediction tasks (regression and classification).

Predictor<FeaturesType,Learner extends Predictor<FeaturesType,Learner,M>,M extends PredictionModel<FeaturesType,M>>

Abstraction for prediction problems (regression and classification).

PredictorParams

(private[ml]) Trait for parameters for prediction (regression and classification).

PrefixSpan

A parallel PrefixSpan algorithm to mine frequent sequential patterns.

PrefixSpan

A parallel PrefixSpan algorithm to mine frequent sequential patterns.

PrefixSpan.FreqSequence<Item>

Represents a frequent sequence.

PrefixSpan.Postfix$

PrefixSpan.Prefix$

PrefixSpanModel<Item>

Model fitted by PrefixSpan param: freqSequences frequent sequences

PrefixSpanModel.SaveLoadV1_0$

PrefixSpanParams

PrefixSpanWrapper

Pregel

Implements a Pregel-like bulk-synchronous message-passing API.

PrimaryKey

A PRIMARY KEY constraint.

PrimaryKey.Builder

ProbabilisticClassificationModel<FeaturesType,M extends ProbabilisticClassificationModel<FeaturesType,M>>

Model produced by a ProbabilisticClassifier.

ProbabilisticClassifier<FeaturesType,E extends ProbabilisticClassifier<FeaturesType,E,M>,M extends ProbabilisticClassificationModel<FeaturesType,M>>

Single-label binary or multiclass classifier which can output class conditional probabilities.

ProbabilisticClassifierParams

(private[classification]) Params for probabilistic classification.

Procedure

A base interface for all procedures.

ProcedureCatalog

A catalog API for working with procedures.

ProcedureParameter

A procedure parameter.

ProcedureParameter.Builder

ProcedureParameter.Mode

An enum representing procedure parameter modes.

ProcessSummary

ProcessTreeMetrics

ProtobufSerDe<T>

:: DeveloperApi :: ProtobufSerDe used to represent the API for serialize and deserialize of Protobuf data related to UI.

ProtobufUtils

ProxyRedirectHandler

A Jetty handler to handle redirects to a proxy server.

PrunedFilteredScan

A BaseRelation that can eliminate unneeded columns and filter using selected predicates before producing an RDD containing all matching tuples as Row objects.

PrunedScan

A BaseRelation that can eliminate unneeded columns before producing an RDD containing all of its tuples as Row objects.

Pseudorandom

:: DeveloperApi :: A class with pseudorandom behavior.

PushBasedFetchHelper

Helper class for ShuffleBlockFetcherIterator that encapsulates all the push-based functionality to fetch push-merged block meta and shuffle chunks.

PythonStreamBlockId

PythonStreamingListener

PythonStreamingQueryListener

Py4J allows a pure interface so this proxy is required.

QRDecomposition<QType,RType>

Represents QR factors.

QuantileDiscretizer

QuantileDiscretizer takes a column with continuous features and outputs a column with binned categorical features.

QuantileDiscretizerBase

Params for QuantileDiscretizer.

QuantileStrategy

Enum for selecting the quantile calculation strategy

QueryContext

Query context of a SparkThrowable.

QueryContext

Contains the catalog and database context information for query execution.

QueryContextType

The type of QueryContext.

QueryExecutionFailure

Indicates that run has failed due to a query execution failure.

QueryExecutionListener

The interface of query execution listener that can be used to analyze execution metrics.

QueryInfo

Represents the query info provided to the stateful processor used in the arbitrary state API v2 to easily identify task retries on the same partition.

QueryOrigin

Records information used to track the provenance of a given query to user code.

QueryOrigin.ExceptionHelpers

QueryOriginType

RandomBlockReplicationPolicy

RandomDataGenerator<T>

Trait for random data generators that generate i.i.d.

RandomForest

ALGORITHM

RandomForest

A class that implements a Random Forest learning algorithm for classification and regression.

RandomForestClassificationModel

Random Forest model for classification.

RandomForestClassificationSummary

Abstraction for multiclass RandomForestClassification results for a given model.

RandomForestClassificationSummaryImpl

Multiclass RandomForestClassification results for a given model.

RandomForestClassificationTrainingSummary

Abstraction for multiclass RandomForestClassification training results.

RandomForestClassificationTrainingSummaryImpl

Multiclass RandomForestClassification training results.

RandomForestClassifier

Random Forest learning algorithm for classification.

RandomForestClassifierParams

RandomForestModel

Represents a random forest model.

RandomForestParams

Parameters for Random Forest algorithms.

RandomForestRegressionModel

Random Forest model for regression.

RandomForestRegressor

Random Forest learning algorithm for regression.

RandomForestRegressorParams

RandomRDDs

Generator methods for creating RDDs comprised of i.i.d. samples from some distribution.

RandomSampler<T,U>

:: DeveloperApi :: A pseudorandom sampler.

RangeDependency<T>

:: DeveloperApi :: Represents a one-to-one dependency between ranges of partitions in the parent and child RDDs.

RangePartitioner<K,V>

A Partitioner that partitions sortable records by range into roughly equal ranges.

RankingEvaluator

:: Experimental :: Evaluator for ranking, which expects two input columns: prediction and label.

RankingMetrics<T>

Evaluator for ranking algorithms.

RateEstimator

A component that estimates the rate at which an InputDStream should ingest records, based on updates at every batch completion.

Rating

A more compact class to represent a rating than Tuple3[Int, Int, Double].

RawTextHelper

RawTextSender

A helper program that sends blocks of Kryo-serialized text strings out on a socket at a specified rate.

RBackendAuthHandler

Authentication handler for connections from the R process.

RDD<T>

A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.

RDDBarrier<T>

:: Experimental :: Wraps an RDD in a barrier stage, which forces Spark to launch tasks of this stage together.

RDDBlockId

RDDDataDistribution

RDDFunctions<T>

Machine learning specific RDD functions.

RDDInfo

RDDPartitionInfo

RDDPartitionSeq

A custom sequence of partitions based on a mutable linked list.

RDDStorageInfo

ReadableChannelFileRegion

ReadAheadInputStream

InputStream implementation which asynchronously reads ahead from the underlying input stream when specified amount of data has been read from the current buffer.

ReadAllAvailable

Represents a ReadLimit where the MicroBatchStream must scan all the data available at the streaming source.

ReadLimit

Interface representing limits on how much to read from a MicroBatchStream when it implements SupportsAdmissionControl.

ReadMaxBytes

Represents a ReadLimit where the MicroBatchStream should scan files which total size doesn't go beyond a given maximum total size.

ReadMaxFiles

Represents a ReadLimit where the MicroBatchStream should scan approximately the given maximum number of files.

ReadMaxRows

Represents a ReadLimit where the MicroBatchStream should scan approximately the given maximum number of rows.

ReadMinRows

Represents a ReadLimit where the MicroBatchStream should scan approximately at least the given minimum number of rows.

Recall.

Trait representing a received block

ReceivedBlockHandler

Trait that represents a class that handles the storage of blocks received by receiver

ReceivedBlockStoreResult

Trait that represents the metadata related to storage of blocks

ReceivedBlockTrackerLogEvent

Trait representing any event in the ReceivedBlockTracker that updates its state.

Receiver<T>

:: DeveloperApi :: Abstract class of a receiver that can be run on worker nodes to receive external data.

ReceiverInfo

:: DeveloperApi :: Class having information about a receiver

ReceiverInputDStream<T>

Abstract class for defining any InputDStream that has to start a receiver on worker nodes to receive external data.

ReceiverMessage

Messages sent to the Receiver.

ReceiverState

Enumeration to identify current state of a Receiver

ReceiverTrackerLocalMessage

Messages used by the driver and ReceiverTrackerEndpoint to communicate locally.

ReceiverTrackerMessage

Messages used by the NetworkReceiver and the ReceiverTracker to communicate with each other.

RecursiveFlag

ReduceFunction<T>

Base interface for function used in Dataset's reduce.

Reducer<I,O>

A 'reducer' for output of user-defined functions.

ReducibleFunction<I,O>

Base class for user-defined functions that can be 'reduced' on another function.

Ref

Convenience extractor for any NamedReference.

RegexTokenizer

A regex based tokenizer that extracts tokens either by using the provided regex pattern to split the text (default) or repeatedly matching the regex (if gaps is false).

RegressionEvaluator

Evaluator for regression, which expects input columns prediction, label and an optional weight column.

RegressionMetrics

Evaluator for regression.

RegressionModel<FeaturesType,M extends RegressionModel<FeaturesType,M>>

Model produced by a Regressor.

RegressionModel

Regressor<FeaturesType,Learner extends Regressor<FeaturesType,Learner,M>,M extends RegressionModel<FeaturesType,M>>

Single-label regression

RelationalGroupedDataset

A set of methods for aggregations on a DataFrame, created by groupBy, cube or rollup (and also pivot).

RelationProvider

Implemented by objects that produce relations for a specific kind of data source.

ReportsSinkMetrics

A mix-in interface for streaming sinks to signal that they can report metrics.

ReportsSourceMetrics

A mix-in interface for SparkDataStream streaming sources to signal that they can report metrics.

RequestMethod

RequiresDistributionAndOrdering

A write that requires a specific distribution and ordering of data.

ResolutionCompletedFlow

A Flow whose flow function has been invoked, meaning either: - Its output schema and dependencies are known.

ResolutionFailedFlow

A Flow whose flow function has failed to resolve.

ResolvedFlow

A Flow whose flow function has successfully resolved.

ResolvedInput

A wrapper for a resolved internal input that includes the alias provided by the user.

ResourceAllocator

Trait used to help executor/worker allocate resources.

ResourceAmountUtils

ResourceDiscoveryPlugin

:: DeveloperApi :: A plugin that can be dynamically loaded into a Spark application to control how custom resources are discovered.

ResourceDiscoveryScriptPlugin

The default plugin that is loaded into a Spark application to control how custom resources are discovered.

ResourceID

Resource identifier.

ResourceInformation

Class to hold information about a type of Resource.

ResourceInformationJson

A case class to simplify JSON serialization of ResourceInformation.

ResourceProfile

Resource profile to associate with an RDD.

ResourceProfile.DefaultProfileExecutorResources$

ResourceProfile.ExecutorResourcesOrDefaults$

ResourceProfileBuilder

Resource profile builder to build a ResourceProfile to associate with an RDD.

ResourceProfileInfo

ResourceRequest

Class that represents a resource request.

ResourceUtils

ResubmitFailedStages

Resubmitted

:: DeveloperApi :: A org.apache.spark.scheduler.ShuffleMapTask that completed successfully earlier, but we lost the executor before the stage completed.

ReturnStatementFinder

ReviveOffers

RewritableTransform

Allows Spark to rewrite the given references of the transform during analysis.

RFormula

Implements the transforms required for fitting a dataset against an R model formula.

RFormulaBase

Base trait for RFormula and RFormulaModel.

RFormulaModel

Model fitted by RFormula.

RFormulaParser

Limited implementation of R formula parsing.

RidgeRegressionModel

Regression model trained using RidgeRegression.

RidgeRegressionWithSGD

Train a regression model with L2-regularization using Stochastic Gradient Descent.

RobustScaler

Scale features using statistics that are robust to outliers.

RobustScalerModel

Model fitted by RobustScaler.

RobustScalerModel.Data$

RobustScalerParams

Params for RobustScaler and RobustScalerModel.

RollingPolicy

Defines the policy based on which RollingFileAppender will generate rolling files.

Row

Represents one row of output from a relational operator.

RowFactory

A factory class used to construct Row objects.

RowLevelOperation

A logical representation of a data source DELETE, UPDATE, or MERGE operation that requires rewriting data.

RowLevelOperation.Command

A row-level SQL command.

RowLevelOperationBuilder

An interface for building a RowLevelOperation.

RowLevelOperationInfo

An interface with logical information for a row-level operation such as DELETE, UPDATE, MERGE.

RowMatrix

Represents a row-oriented distributed Matrix with no meaningful row indices.

RpcUtils

RRDD<T>

An RDD that stores serialized R objects as Array[Byte].

RRunnerModes

RunCompletion

Indicates that a triggered run has successfully completed execution.

RunFailure

Indicates that an run entered the failed state..

RunTerminationException

Helper exception class that indicates that a run has to be terminated and tracks the associated termination reason.

RunTerminationReason

RuntimeConfig

Runtime configuration interface for Spark.

This is the Scala stub of SparkR read.ml.

RWrapperUtils

SafeJsonSerializer

SamplePathFilter

Filter that allows loading a fraction of HDFS files.

SamplingUtils

Saveable

Trait for models and transformers which may be saved as files.

SaveInstanceEnd

Event fired after MLWriter.save.

SaveInstanceStart

Event fired before MLWriter.save.

SaveMode

SaveMode is used to specify the expected behavior of saving a DataFrame to a data source.

ScalarFunction<R>

Interface for a function that produces a result value for each input row.

Scan

A logical representation of a data source scan.

Scan.ColumnarSupportMode

This enum defines how the columnar support for the partitions of the data source should be determined.

ScanBuilder

An interface for building the Scan.

Schedulable

An interface for schedulable entities.

SchedulableBuilder

An interface to build Schedulable tree buildPools: build the tree nodes(pools) addTaskSetManager: build the leaf nodes(TaskSetManagers)

SchedulerBackend

A backend interface for scheduling systems that allows plugging in different ones under TaskSchedulerImpl.

SchedulerBackendUtils

SchedulerPool

SchedulingAlgorithm

An interface for sort algorithm FIFO: FIFO algorithm between TaskSetManagers FS: FS algorithm between Pools, and FIFO or FS within Pools

SchedulingMode

"FAIR" and "FIFO" determines which policy is used to order tasks amongst a Schedulable's sub-queues "NONE" is used when the a Schedulable has no sub-queues.

SchemaInferenceUtils

SchemaMergingUtils

SchemaRelationProvider

Implemented by objects that produce relations for a specific kind of data source with a given schema.

SchemaUtils

Utils for handling schemas.

SchemaUtils

Utils for handling schemas.

SchemaUtils.ColumnPath$

Seconds

Helper object that creates instance of Duration representing a given number of seconds.

SecurityConfigurationLock

There are cases when global JVM security configuration must be modified.

SecurityUtils

Various utility methods used by Spark Security.

SelectorParams

Params for Selector and SelectorModel.

SequenceFileRDDFunctions<K,V>

Extra functions available on RDDs of (key, value) pairs to create a Hadoop SequenceFile, through an implicit conversion.

SerDe

Utility functions to serialize, deserialize objects to / from R

SerializableConfiguration

Hadoop configuration but serializable.

SerializableWritable<T extends org.apache.hadoop.io.Writable>

SerializationDebugger

SerializationDebugger.ObjectStreamClassMethods

An implicit class that allows us to call private methods of ObjectStreamClass.

SerializationDebugger.ObjectStreamClassMethods$

SerializationFormats

SerializationStream

:: DeveloperApi :: A stream for writing serialized objects.

SerializedMemoryEntry<T>

SerializedValuesHolder<T>

A holder for storing the serialized values.

Serializer

:: DeveloperApi :: A serializer.

SerializerHelper

SerializerInstance

:: DeveloperApi :: An instance of a serializer, for use by one thread at a time.

SessionConfigSupport

A mix-in interface for TableProvider.

SharedParamsCodeGen

Code generator for shared params (sharedParams.scala).

ShortestPaths

Computes shortest paths to the given set of landmark vertices, returning a graph where each vertex attribute is a map containing the shortest-path distance to each reachable landmark.

ShortExactNumeric

ShortType

The data type representing Short values.

ShuffleChecksumBlockId

ShuffleDataBlockId

ShuffleDataIO

:: Private :: An interface for plugging in modules for storing and reading temporary shuffle data.

ShuffleDependency<K,V,C>

ShuffledRDD<K,V,C>

:: DeveloperApi :: The resulting RDD from a shuffle (e.g.

ShuffleDriverComponents

:: Private :: An interface for building shuffle support modules for the Driver.

ShuffleExecutorComponents

:: Private :: An interface for building shuffle support for Executors.

ShuffleFetchCompletionListener

A listener to be called at the completion of the ShuffleBlockFetcherIterator param: data the ShuffleBlockFetcherIterator to process

ShuffleIndexBlockId

ShuffleMapOutputWriter

:: Private :: A top-level writer that returns child writers for persisting the output of a map task, and then commits all of the writes as one atomic operation.

ShuffleMergedBlockId

ShuffleMergedDataBlockId

ShuffleMergedIndexBlockId

ShuffleMergedMetaBlockId

ShuffleOutputStatus

A common trait between MapStatus and MergeStatus.

ShufflePartitionWriter

:: Private :: An interface for opening streams to persist partition bytes to a backing data store.

ShufflePushBlockId

ShufflePushReadMetricDistributions

ShufflePushReadMetrics

ShuffleReadMetricDistributions

ShuffleReadMetrics

ShuffleStatus

Helper class used by the MapOutputTrackerMaster to perform bookkeeping for a single ShuffleMapStage.

ShuffleWriteMetricDistributions

ShuffleWriteMetrics

ShutdownHookManager

Various utility methods used by Spark.

SignalUtils

Contains utilities for working with posix signals.

SimpleFutureAction<T>

A FutureAction holding the result of an action that triggers a single job.

SimpleMetricsCachedBatch

A CachedBatch that stores some simple metrics that can be used for filtering of batches with the SimpleMetricsCachedBatchSerializer.

SimpleMetricsCachedBatchSerializer

Provides basic filtering for CachedBatchSerializer implementations.

SimpleUpdater

A simple updater for gradient descent *without* any regularization.

SingleSpillShuffleMapOutputWriter

Optional extension for partition writing that is optimized for transferring a single file to the backing store.

SingleValueExecutorMetricType

SingularValueDecomposition<UType,VType>

Represents singular value decomposition (SVD) factors.

Sink

SinkImpl

SinkProgress

Information about progress made for a sink in the execution of a StreamingQuery during a trigger.

SinkProgressSerializer

SinkWrite

A `StreamingFlowExecution` that writes a streaming `DataFrame` to a `Sink`.

SizeEstimator

:: DeveloperApi :: Estimates the sizes of Java objects (number of bytes of memory they occupy), for use in memory-aware caches.

SnappyCompressionCodec

:: DeveloperApi :: Snappy implementation of CompressionCodec.

SnowflakeDialect

SomeTables

Used in partial graph updates to select "selectedTables".

SortDirection

A sort direction used in sorting expressions.

SortOrder

Represents a sort order in the public expression API.

Source

SourceProgress

Information about progress made for a source in the execution of a StreamingQuery during a trigger.

SourceProgressSerializer

SparkAppHandle

A handle to a running Spark application.

SparkAppHandle.Listener

Listener for updates to a handle's state.

SparkAppHandle.State

Represents the application's state.

SparkAWSCredentials

Serializable interface providing a method executors can call to obtain an AWSCredentialsProvider instance for authenticating to AWS services.

SparkAWSCredentials.Builder

Builder for SparkAWSCredentials instances.

Configuration for a Spark application.

SparkContext

Main entry point for Spark functionality.

SparkDataStream

The base interface representing a readable data stream in a Spark streaming query.

SparkEnv

:: DeveloperApi :: Holds all the runtime environment objects for a running Spark instance (either master or worker), including the serializer, RpcEnv, block manager, map output tracker, etc.

Exposes information about Spark Executors.

SparkExecutorInfoImpl

SparkExitCode

SparkFiles

Resolves paths to files added through SparkContext.addFile().

SparkFileUtils

SparkFilterApi

TODO (PARQUET-1809): This is a temporary workaround; it is intended to be moved to Parquet.

SparkFirehoseListener

Class that allows users to receive all SparkListener events.

SparkHadoopMapRedUtil

SparkJobInfo

Exposes information about Spark Jobs.

SparkJobInfoImpl

SparkLauncher

Launcher for Spark applications.

SparkListener

:: DeveloperApi :: A default implementation for SparkListenerInterface that has no-op implementations for all callbacks.

SparkListenerApplicationEnd

SparkListenerApplicationStart

SparkListenerBlockManagerAdded

SparkListenerBlockManagerRemoved

SparkListenerBlockUpdated

SparkListenerBus

A SparkListenerEvent bus that relays SparkListenerEvents to its listeners

SparkListenerEnvironmentUpdate

SparkListenerEvent

SparkListenerExecutorAdded

SparkListenerExecutorBlacklisted

Deprecated.

use SparkListenerExecutorExcluded instead.

SparkListenerExecutorBlacklistedForStage

Deprecated.

use SparkListenerExecutorExcludedForStage instead.

SparkListenerExecutorExcluded

SparkListenerExecutorExcludedForStage

SparkListenerExecutorMetricsUpdate

Periodic updates from executors.

SparkListenerExecutorRemoved

SparkListenerExecutorUnblacklisted

Deprecated.

use SparkListenerExecutorUnexcluded instead.

SparkListenerExecutorUnexcluded

SparkListenerInterface

Interface for listening to events from the Spark scheduler.

SparkListenerJobEnd

SparkListenerJobStart

SparkListenerLogStart

An internal class that describes the metadata of an event log.

SparkListenerMiscellaneousProcessAdded

SparkListenerNodeBlacklisted

Deprecated.

use SparkListenerNodeExcluded instead.

SparkListenerNodeBlacklistedForStage

Deprecated.

use SparkListenerNodeExcludedForStage instead.

SparkListenerNodeExcluded

SparkListenerNodeExcludedForStage

SparkListenerNodeUnblacklisted

Deprecated.

use SparkListenerNodeUnexcluded instead.

SparkListenerNodeUnexcluded

SparkListenerResourceProfileAdded

SparkListenerSpeculativeTaskSubmitted

SparkListenerStageCompleted

SparkListenerStageExecutorMetrics

Peak metric values for the executor for the stage, written to the history log at stage completion.

SparkListenerStageSubmitted

SparkListenerTaskEnd

SparkListenerTaskGettingResult

SparkListenerTaskStart

SparkListenerUnpersistRDD

SparkListenerUnschedulableTaskSetAdded

SparkListenerUnschedulableTaskSetRemoved

SparkMasterRegex

A collection of regexes for extracting information from the master string.

SparkPath

A canonical representation of a file path.

SparkPlugin

:: DeveloperApi :: A plugin that can be dynamically loaded into a Spark application.

SparkSchemaUtils

Utils for handling schemas.

SparkSerDeUtils

SparkSession

The entry point to programming Spark with the Dataset and DataFrame API.

SparkSession.Builder

SparkSessionExtensions

:: Experimental :: Holder for injection points to the SparkSession.

SparkSessionExtensionsProvider

Base trait for implementations used by SparkSessionExtensions

SparkSessionUtils

SparkShutdownHook

SparkStageInfo

Exposes information about Spark Stages.

SparkStageInfoImpl

SparkStatusTracker

Low-level status reporting APIs for monitoring job and stage progress.

SparkTestUtils.JavaSourceFromString

SparkThreadUtils

SparkThrowable

Interface mixed into Throwables thrown from Spark.

SparkThrowableHelper

Companion object used by instances of SparkThrowable to access error class information and construct error messages.

SparseMatrix

Column-major sparse matrix.

SparseMatrix

Column-major sparse matrix.

SparseVector

A sparse vector represented by an index array and a value array.

SparseVector

A sparse vector represented by an index array and a value array.

SpatialType

SpearmanCorrelation

Compute Spearman's correlation for two RDDs of the type RDD[Double] or the correlation matrix for an RDD of the type RDD[Vector].

SpecialLengths

SpeculationStageSummary

SpillListener

A SparkListener that detects whether spills have occurred in Spark jobs.

Split

Interface for a "Split," which specifies a test made at a decision tree node to choose the left or right path.

Split

Split applied to a feature param: feature feature index param: threshold Threshold for continuous feature.

SplitInfo

SQLContext

The entry point for working with structured data (rows and columns) in Spark 1.x.

SQLContextCompanion

This SQLContext object contains utility functions to create a singleton SQLContext instance, or to get the created SQLContext instance.

SQLDataTypes

SQL data types for vectors and matrices.

SqlGraphElementRegistrationException

SqlGraphRegistrationContext

SQL statement processor context.

SqlGraphRegistrationContext.SqlQueryPlanWithOrigin

Class that holds the logical plan and query origin parsed from a SQL statement.

SqlGraphRegistrationContext.SqlQueryPlanWithOrigin$

SqlGraphRegistrationContextState

Data class for all state that is accumulated while processing a particular SqlGraphRegistrationContext.

SQLImplicits

A collection of implicit methods for converting common Scala objects into Datasets.

SQLOpenHashSet<T>

SQLPlanMetricSerializer

SQLTransformer

Implements the transformations which are defined by SQL statement.

SQLUserDefinedType

::DeveloperApi:: A user-defined type which can be automatically recognized by a SQLContext and registered.

SQLUtils

SquaredError

Class for squared error loss calculation.

SquaredEuclideanSilhouette

SquaredEuclideanSilhouette computes the average of the Silhouette over all the data of the dataset, which is a measure of how appropriately the data have been clustered.

SquaredEuclideanSilhouette.ClusterStats

SquaredEuclideanSilhouette.ClusterStats$

SquaredL2Updater

Updater for L2 regularized problems.

Represents a table which is staged for being committed to the metastore.

StageInfo

:: DeveloperApi :: Stores information about a stage to pass from the scheduler to SparkListeners.

StageStatus

StageStatusSerializer

StagingTableCatalog

An optional mix-in for implementations of TableCatalog that support staging creation of a table before committing the table's metadata along with its contents in CREATE TABLE AS SELECT or REPLACE TABLE AS SELECT operations.

StandardNormalGenerator

Generates i.i.d.

StandardScaler

Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set.

StandardScaler

Standardizes features by removing the mean and scaling to unit std using column summary statistics on the samples in the training set.

StandardScalerModel

Model fitted by StandardScaler.

StandardScalerModel

Represents a StandardScaler model that can transform vectors.

StandardScalerModel.Data$

StandardScalerParams

Params for StandardScaler and StandardScalerModel.

StatCounter

A class for tracking the statistics of a set of numbers (count, mean and variance) in a numerically robust way.

State

State<S>

:: Experimental :: Abstract class for getting and updating the state in mapping function used in the mapWithState operation of a pair DStream (Scala) or a JavaPairDStream (Java).

StatefulProcessor<K,I,O>

Represents the arbitrary stateful logic that needs to be provided by the user to perform stateful manipulations on keyed streams.

StatefulProcessorHandle

Represents the operation handle provided to the stateful processor used in the arbitrary state API v2.

StatefulProcessorWithInitialState<K,I,O,S>

Stateful processor with support for specifying initial state.

StateOperatorProgress

Information about updates made to stateful operators in a StreamingQuery during a trigger.

StateOperatorProgressSerializer

StateSpec<KeyType,ValueType,StateType,MappedType>

:: Experimental :: Abstract class representing all the specifications of the DStream transformation mapWithState operation of a pair DStream (Scala) or a JavaPairDStream (Java).

StaticSources

Statistics

API for statistical functions in MLlib.

Statistics

An interface to represent statistics for a data source, which is returned by SupportsReportStatistics.estimateStatistics().

StatsdMetricType

StatsReportListener

:: DeveloperApi :: Simple SparkListener that logs a few summary statistics when each stage completes.

StatsReportListener

:: DeveloperApi :: A simple StreamingListener that logs summary statistics across Spark Streaming batches param: numBatchInfos Number of last batches to consider for generating statistics (default: 10)

StatusUpdate

StopAllReceivers

This message will trigger ReceiverTrackerEndpoint to send stop signals to all registered receivers.

A feature transformer that filters out stop words from input.

StorageLevel

:: DeveloperApi :: Flags for controlling the storage of an RDD.

StorageLevelMapper

A mapper class easy to obtain storage levels based on their names.

StorageLevels

Expose some commonly useful storage level constants.

StorageUtils

Helper methods for storage-related objects.

StoreTypes

StoreTypes.AccumulableInfo

Protobuf type org.apache.spark.status.protobuf.AccumulableInfo

StoreTypes.AccumulableInfo.Builder

Protobuf type org.apache.spark.status.protobuf.AccumulableInfo

StoreTypes.AccumulableInfoOrBuilder

StoreTypes.ApplicationAttemptInfo

Protobuf type org.apache.spark.status.protobuf.ApplicationAttemptInfo

StoreTypes.ApplicationAttemptInfo.Builder

Protobuf type org.apache.spark.status.protobuf.ApplicationAttemptInfo

StoreTypes.ApplicationAttemptInfoOrBuilder

StoreTypes.ApplicationEnvironmentInfo

Protobuf type org.apache.spark.status.protobuf.ApplicationEnvironmentInfo

StoreTypes.ApplicationEnvironmentInfo.Builder

Protobuf type org.apache.spark.status.protobuf.ApplicationEnvironmentInfo

StoreTypes.ApplicationEnvironmentInfoOrBuilder

StoreTypes.ApplicationEnvironmentInfoWrapper

Protobuf type org.apache.spark.status.protobuf.ApplicationEnvironmentInfoWrapper

StoreTypes.ApplicationEnvironmentInfoWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.ApplicationEnvironmentInfoWrapper

StoreTypes.ApplicationEnvironmentInfoWrapperOrBuilder

StoreTypes.ApplicationInfo

Protobuf type org.apache.spark.status.protobuf.ApplicationInfo

StoreTypes.ApplicationInfo.Builder

Protobuf type org.apache.spark.status.protobuf.ApplicationInfo

StoreTypes.ApplicationInfoOrBuilder

StoreTypes.ApplicationInfoWrapper

Protobuf type org.apache.spark.status.protobuf.ApplicationInfoWrapper

StoreTypes.ApplicationInfoWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.ApplicationInfoWrapper

StoreTypes.ApplicationInfoWrapperOrBuilder

StoreTypes.AppSummary

Protobuf type org.apache.spark.status.protobuf.AppSummary

StoreTypes.AppSummary.Builder

Protobuf type org.apache.spark.status.protobuf.AppSummary

StoreTypes.AppSummaryOrBuilder

StoreTypes.CachedQuantile

Protobuf type org.apache.spark.status.protobuf.CachedQuantile

StoreTypes.CachedQuantile.Builder

Protobuf type org.apache.spark.status.protobuf.CachedQuantile

StoreTypes.CachedQuantileOrBuilder

StoreTypes.DeterministicLevel

Protobuf enum org.apache.spark.status.protobuf.DeterministicLevel

StoreTypes.ExecutorMetrics

Protobuf type org.apache.spark.status.protobuf.ExecutorMetrics

StoreTypes.ExecutorMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorMetrics

StoreTypes.ExecutorMetricsDistributions

Protobuf type org.apache.spark.status.protobuf.ExecutorMetricsDistributions

StoreTypes.ExecutorMetricsDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorMetricsDistributions

StoreTypes.ExecutorMetricsDistributionsOrBuilder

StoreTypes.ExecutorMetricsOrBuilder

StoreTypes.ExecutorPeakMetricsDistributions

Protobuf type org.apache.spark.status.protobuf.ExecutorPeakMetricsDistributions

StoreTypes.ExecutorPeakMetricsDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorPeakMetricsDistributions

StoreTypes.ExecutorPeakMetricsDistributionsOrBuilder

StoreTypes.ExecutorResourceRequest

Protobuf type org.apache.spark.status.protobuf.ExecutorResourceRequest

StoreTypes.ExecutorResourceRequest.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorResourceRequest

StoreTypes.ExecutorResourceRequestOrBuilder

StoreTypes.ExecutorStageSummary

Protobuf type org.apache.spark.status.protobuf.ExecutorStageSummary

StoreTypes.ExecutorStageSummary.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorStageSummary

StoreTypes.ExecutorStageSummaryOrBuilder

StoreTypes.ExecutorStageSummaryWrapper

Protobuf type org.apache.spark.status.protobuf.ExecutorStageSummaryWrapper

StoreTypes.ExecutorStageSummaryWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorStageSummaryWrapper

StoreTypes.ExecutorStageSummaryWrapperOrBuilder

StoreTypes.ExecutorSummary

Protobuf type org.apache.spark.status.protobuf.ExecutorSummary

StoreTypes.ExecutorSummary.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorSummary

StoreTypes.ExecutorSummaryOrBuilder

StoreTypes.ExecutorSummaryWrapper

Protobuf type org.apache.spark.status.protobuf.ExecutorSummaryWrapper

StoreTypes.ExecutorSummaryWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.ExecutorSummaryWrapper

StoreTypes.ExecutorSummaryWrapperOrBuilder

StoreTypes.InputMetricDistributions

Protobuf type org.apache.spark.status.protobuf.InputMetricDistributions

StoreTypes.InputMetricDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.InputMetricDistributions

StoreTypes.InputMetricDistributionsOrBuilder

StoreTypes.InputMetrics

Protobuf type org.apache.spark.status.protobuf.InputMetrics

StoreTypes.InputMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.InputMetrics

StoreTypes.InputMetricsOrBuilder

StoreTypes.JobData

Protobuf type org.apache.spark.status.protobuf.JobData

StoreTypes.JobData.Builder

Protobuf type org.apache.spark.status.protobuf.JobData

StoreTypes.JobDataOrBuilder

StoreTypes.JobDataWrapper

Protobuf type org.apache.spark.status.protobuf.JobDataWrapper

StoreTypes.JobDataWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.JobDataWrapper

StoreTypes.JobDataWrapperOrBuilder

StoreTypes.JobExecutionStatus

Protobuf enum org.apache.spark.status.protobuf.JobExecutionStatus

StoreTypes.MemoryMetrics

Protobuf type org.apache.spark.status.protobuf.MemoryMetrics

StoreTypes.MemoryMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.MemoryMetrics

StoreTypes.MemoryMetricsOrBuilder

StoreTypes.OutputMetricDistributions

Protobuf type org.apache.spark.status.protobuf.OutputMetricDistributions

StoreTypes.OutputMetricDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.OutputMetricDistributions

StoreTypes.OutputMetricDistributionsOrBuilder

StoreTypes.OutputMetrics

Protobuf type org.apache.spark.status.protobuf.OutputMetrics

StoreTypes.OutputMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.OutputMetrics

StoreTypes.OutputMetricsOrBuilder

StoreTypes.PairStrings

Protobuf type org.apache.spark.status.protobuf.PairStrings

StoreTypes.PairStrings.Builder

Protobuf type org.apache.spark.status.protobuf.PairStrings

StoreTypes.PairStringsOrBuilder

StoreTypes.PoolData

Protobuf type org.apache.spark.status.protobuf.PoolData

StoreTypes.PoolData.Builder

Protobuf type org.apache.spark.status.protobuf.PoolData

StoreTypes.PoolDataOrBuilder

StoreTypes.ProcessSummary

Protobuf type org.apache.spark.status.protobuf.ProcessSummary

StoreTypes.ProcessSummary.Builder

Protobuf type org.apache.spark.status.protobuf.ProcessSummary

StoreTypes.ProcessSummaryOrBuilder

StoreTypes.ProcessSummaryWrapper

Protobuf type org.apache.spark.status.protobuf.ProcessSummaryWrapper

StoreTypes.ProcessSummaryWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.ProcessSummaryWrapper

StoreTypes.ProcessSummaryWrapperOrBuilder

StoreTypes.RDDDataDistribution

Protobuf type org.apache.spark.status.protobuf.RDDDataDistribution

StoreTypes.RDDDataDistribution.Builder

Protobuf type org.apache.spark.status.protobuf.RDDDataDistribution

StoreTypes.RDDDataDistributionOrBuilder

StoreTypes.RDDOperationClusterWrapper

Protobuf type org.apache.spark.status.protobuf.RDDOperationClusterWrapper

StoreTypes.RDDOperationClusterWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.RDDOperationClusterWrapper

StoreTypes.RDDOperationClusterWrapperOrBuilder

StoreTypes.RDDOperationEdge

Protobuf type org.apache.spark.status.protobuf.RDDOperationEdge

StoreTypes.RDDOperationEdge.Builder

Protobuf type org.apache.spark.status.protobuf.RDDOperationEdge

StoreTypes.RDDOperationEdgeOrBuilder

StoreTypes.RDDOperationGraphWrapper

Protobuf type org.apache.spark.status.protobuf.RDDOperationGraphWrapper

StoreTypes.RDDOperationGraphWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.RDDOperationGraphWrapper

StoreTypes.RDDOperationGraphWrapperOrBuilder

StoreTypes.RDDOperationNode

Protobuf type org.apache.spark.status.protobuf.RDDOperationNode

StoreTypes.RDDOperationNode.Builder

Protobuf type org.apache.spark.status.protobuf.RDDOperationNode

StoreTypes.RDDOperationNodeOrBuilder

StoreTypes.RDDPartitionInfo

Protobuf type org.apache.spark.status.protobuf.RDDPartitionInfo

StoreTypes.RDDPartitionInfo.Builder

Protobuf type org.apache.spark.status.protobuf.RDDPartitionInfo

StoreTypes.RDDPartitionInfoOrBuilder

StoreTypes.RDDStorageInfo

Protobuf type org.apache.spark.status.protobuf.RDDStorageInfo

StoreTypes.RDDStorageInfo.Builder

Protobuf type org.apache.spark.status.protobuf.RDDStorageInfo

StoreTypes.RDDStorageInfoOrBuilder

StoreTypes.RDDStorageInfoWrapper

Protobuf type org.apache.spark.status.protobuf.RDDStorageInfoWrapper

StoreTypes.RDDStorageInfoWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.RDDStorageInfoWrapper

StoreTypes.RDDStorageInfoWrapperOrBuilder

StoreTypes.ResourceInformation

Protobuf type org.apache.spark.status.protobuf.ResourceInformation

StoreTypes.ResourceInformation.Builder

Protobuf type org.apache.spark.status.protobuf.ResourceInformation

StoreTypes.ResourceInformationOrBuilder

StoreTypes.ResourceProfileInfo

Protobuf type org.apache.spark.status.protobuf.ResourceProfileInfo

StoreTypes.ResourceProfileInfo.Builder

Protobuf type org.apache.spark.status.protobuf.ResourceProfileInfo

StoreTypes.ResourceProfileInfoOrBuilder

StoreTypes.ResourceProfileWrapper

Protobuf type org.apache.spark.status.protobuf.ResourceProfileWrapper

StoreTypes.ResourceProfileWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.ResourceProfileWrapper

StoreTypes.ResourceProfileWrapperOrBuilder

StoreTypes.RuntimeInfo

Protobuf type org.apache.spark.status.protobuf.RuntimeInfo

StoreTypes.RuntimeInfo.Builder

Protobuf type org.apache.spark.status.protobuf.RuntimeInfo

StoreTypes.RuntimeInfoOrBuilder

StoreTypes.ShufflePushReadMetricDistributions

Protobuf type org.apache.spark.status.protobuf.ShufflePushReadMetricDistributions

StoreTypes.ShufflePushReadMetricDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.ShufflePushReadMetricDistributions

StoreTypes.ShufflePushReadMetricDistributionsOrBuilder

StoreTypes.ShufflePushReadMetrics

Protobuf type org.apache.spark.status.protobuf.ShufflePushReadMetrics

StoreTypes.ShufflePushReadMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.ShufflePushReadMetrics

StoreTypes.ShufflePushReadMetricsOrBuilder

StoreTypes.ShuffleReadMetricDistributions

Protobuf type org.apache.spark.status.protobuf.ShuffleReadMetricDistributions

StoreTypes.ShuffleReadMetricDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.ShuffleReadMetricDistributions

StoreTypes.ShuffleReadMetricDistributionsOrBuilder

StoreTypes.ShuffleReadMetrics

Protobuf type org.apache.spark.status.protobuf.ShuffleReadMetrics

StoreTypes.ShuffleReadMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.ShuffleReadMetrics

StoreTypes.ShuffleReadMetricsOrBuilder

StoreTypes.ShuffleWriteMetricDistributions

Protobuf type org.apache.spark.status.protobuf.ShuffleWriteMetricDistributions

StoreTypes.ShuffleWriteMetricDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.ShuffleWriteMetricDistributions

StoreTypes.ShuffleWriteMetricDistributionsOrBuilder

StoreTypes.ShuffleWriteMetrics

Protobuf type org.apache.spark.status.protobuf.ShuffleWriteMetrics

StoreTypes.ShuffleWriteMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.ShuffleWriteMetrics

StoreTypes.ShuffleWriteMetricsOrBuilder

StoreTypes.SinkProgress

Protobuf type org.apache.spark.status.protobuf.SinkProgress

StoreTypes.SinkProgress.Builder

Protobuf type org.apache.spark.status.protobuf.SinkProgress

StoreTypes.SinkProgressOrBuilder

StoreTypes.SourceProgress

Protobuf type org.apache.spark.status.protobuf.SourceProgress

StoreTypes.SourceProgress.Builder

Protobuf type org.apache.spark.status.protobuf.SourceProgress

StoreTypes.SourceProgressOrBuilder

StoreTypes.SparkPlanGraphClusterWrapper

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphClusterWrapper

StoreTypes.SparkPlanGraphClusterWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphClusterWrapper

StoreTypes.SparkPlanGraphClusterWrapperOrBuilder

StoreTypes.SparkPlanGraphEdge

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphEdge

StoreTypes.SparkPlanGraphEdge.Builder

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphEdge

StoreTypes.SparkPlanGraphEdgeOrBuilder

StoreTypes.SparkPlanGraphNode

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphNode

StoreTypes.SparkPlanGraphNode.Builder

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphNode

StoreTypes.SparkPlanGraphNodeOrBuilder

StoreTypes.SparkPlanGraphNodeWrapper

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphNodeWrapper

StoreTypes.SparkPlanGraphNodeWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphNodeWrapper

StoreTypes.SparkPlanGraphNodeWrapper.WrapperCase

StoreTypes.SparkPlanGraphNodeWrapperOrBuilder

StoreTypes.SparkPlanGraphWrapper

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphWrapper

StoreTypes.SparkPlanGraphWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.SparkPlanGraphWrapper

StoreTypes.SparkPlanGraphWrapperOrBuilder

StoreTypes.SpeculationStageSummary

Protobuf type org.apache.spark.status.protobuf.SpeculationStageSummary

StoreTypes.SpeculationStageSummary.Builder

Protobuf type org.apache.spark.status.protobuf.SpeculationStageSummary

StoreTypes.SpeculationStageSummaryOrBuilder

StoreTypes.SpeculationStageSummaryWrapper

Protobuf type org.apache.spark.status.protobuf.SpeculationStageSummaryWrapper

StoreTypes.SpeculationStageSummaryWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.SpeculationStageSummaryWrapper

StoreTypes.SpeculationStageSummaryWrapperOrBuilder

StoreTypes.SQLExecutionUIData

Protobuf type org.apache.spark.status.protobuf.SQLExecutionUIData

StoreTypes.SQLExecutionUIData.Builder

Protobuf type org.apache.spark.status.protobuf.SQLExecutionUIData

StoreTypes.SQLExecutionUIDataOrBuilder

StoreTypes.SQLPlanMetric

Protobuf type org.apache.spark.status.protobuf.SQLPlanMetric

StoreTypes.SQLPlanMetric.Builder

Protobuf type org.apache.spark.status.protobuf.SQLPlanMetric

StoreTypes.SQLPlanMetricOrBuilder

StoreTypes.StageData

Protobuf type org.apache.spark.status.protobuf.StageData

StoreTypes.StageData.Builder

Protobuf type org.apache.spark.status.protobuf.StageData

StoreTypes.StageDataOrBuilder

StoreTypes.StageDataWrapper

Protobuf type org.apache.spark.status.protobuf.StageDataWrapper

StoreTypes.StageDataWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.StageDataWrapper

StoreTypes.StageDataWrapperOrBuilder

StoreTypes.StageStatus

Protobuf enum org.apache.spark.status.protobuf.StageStatus

StoreTypes.StateOperatorProgress

Protobuf type org.apache.spark.status.protobuf.StateOperatorProgress

StoreTypes.StateOperatorProgress.Builder

Protobuf type org.apache.spark.status.protobuf.StateOperatorProgress

StoreTypes.StateOperatorProgressOrBuilder

StoreTypes.StreamBlockData

Protobuf type org.apache.spark.status.protobuf.StreamBlockData

StoreTypes.StreamBlockData.Builder

Protobuf type org.apache.spark.status.protobuf.StreamBlockData

StoreTypes.StreamBlockDataOrBuilder

StoreTypes.StreamingQueryData

Protobuf type org.apache.spark.status.protobuf.StreamingQueryData

StoreTypes.StreamingQueryData.Builder

Protobuf type org.apache.spark.status.protobuf.StreamingQueryData

StoreTypes.StreamingQueryDataOrBuilder

StoreTypes.StreamingQueryProgress

Protobuf type org.apache.spark.status.protobuf.StreamingQueryProgress

StoreTypes.StreamingQueryProgress.Builder

Protobuf type org.apache.spark.status.protobuf.StreamingQueryProgress

StoreTypes.StreamingQueryProgressOrBuilder

StoreTypes.StreamingQueryProgressWrapper

Protobuf type org.apache.spark.status.protobuf.StreamingQueryProgressWrapper

StoreTypes.StreamingQueryProgressWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.StreamingQueryProgressWrapper

StoreTypes.StreamingQueryProgressWrapperOrBuilder

StoreTypes.TaskData

Protobuf type org.apache.spark.status.protobuf.TaskData

StoreTypes.TaskData.Builder

Protobuf type org.apache.spark.status.protobuf.TaskData

StoreTypes.TaskDataOrBuilder

StoreTypes.TaskDataWrapper

Protobuf type org.apache.spark.status.protobuf.TaskDataWrapper

StoreTypes.TaskDataWrapper.Builder

Protobuf type org.apache.spark.status.protobuf.TaskDataWrapper

StoreTypes.TaskDataWrapperOrBuilder

StoreTypes.TaskMetricDistributions

Protobuf type org.apache.spark.status.protobuf.TaskMetricDistributions

StoreTypes.TaskMetricDistributions.Builder

Protobuf type org.apache.spark.status.protobuf.TaskMetricDistributions

StoreTypes.TaskMetricDistributionsOrBuilder

StoreTypes.TaskMetrics

Protobuf type org.apache.spark.status.protobuf.TaskMetrics

StoreTypes.TaskMetrics.Builder

Protobuf type org.apache.spark.status.protobuf.TaskMetrics

StoreTypes.TaskMetricsOrBuilder

StoreTypes.TaskResourceRequest

Protobuf type org.apache.spark.status.protobuf.TaskResourceRequest

StoreTypes.TaskResourceRequest.Builder

Protobuf type org.apache.spark.status.protobuf.TaskResourceRequest

StoreTypes.TaskResourceRequestOrBuilder

Strategy

Stores all the configuration options for tree construction param: algo Learning goal.

StratifiedSamplingUtils

Auxiliary functions and data structures for the sampleByKey method in PairRDDFunctions.

StreamBlockId

StreamingConf

StreamingContext

Deprecated.

This is deprecated as of Spark 3.4.0.

StreamingContextPythonHelper

StreamingContextState

:: DeveloperApi :: Represents the state of a StreamingContext.

StreamingDataWriterFactory

A factory of DataWriter returned by StreamingWrite.createStreamingWriterFactory(PhysicalWriteInfo), which is responsible for creating and initializing the actual data writer at executor side.

StreamingFlow

A Flow that represents stateful movement of data to some target.

StreamingFlowExecution

A 'FlowExecution' that processes data statefully using Structured Streaming.

StreamingKMeans

StreamingKMeans provides methods for configuring a streaming k-means analysis, training the model on streaming, and using the model to make predictions on streaming data.

StreamingKMeansModel

StreamingKMeansModel extends MLlib's KMeansModel for streaming algorithms, so it can keep track of a continuously updated weight associated with each cluster, and also update the model by doing a single iteration of the standard k-means algorithm.

StreamingLinearAlgorithm<M extends GeneralizedLinearModel,A extends GeneralizedLinearAlgorithm<M>>

StreamingLinearAlgorithm implements methods for continuously training a generalized linear model on streaming data, and using it for prediction on (possibly different) streaming data.

StreamingLinearRegressionWithSGD

Train or predict a linear regression model on streaming data.

StreamingListener

:: DeveloperApi :: A listener interface for receiving information about an ongoing streaming computation.

StreamingListenerBatchCompleted

StreamingListenerBatchStarted

StreamingListenerBatchSubmitted

StreamingListenerEvent

:: DeveloperApi :: Base trait for events related to StreamingListener

StreamingListenerOutputOperationCompleted

StreamingListenerOutputOperationStarted

StreamingListenerReceiverError

StreamingListenerReceiverStarted

StreamingListenerReceiverStopped

StreamingListenerStreamingStarted

StreamingLogisticRegressionWithSGD

Train or predict a logistic regression model on streaming data.

StreamingQuery

A handle to a query that is executing continuously in the background as new data arrives.

StreamingQueryException

Exception that stopped a StreamingQuery.

StreamingQueryListener

Interface for listening to events related to StreamingQueries.

StreamingQueryListener.Event

Base type of StreamingQueryListener events

StreamingQueryListener.QueryIdleEvent

Event representing that query is idle and waiting for new data to process.

StreamingQueryListener.QueryIdleEvent$

StreamingQueryListener.QueryProgressEvent

Event representing any progress updates in a query.

StreamingQueryListener.QueryProgressEvent$

StreamingQueryListener.QueryStartedEvent

Event representing the start of a query param: id A unique query id that persists across restarts.

StreamingQueryListener.QueryStartedEvent$

StreamingQueryListener.QueryTerminatedEvent

Event representing that termination of a query.

StreamingQueryListener.QueryTerminatedEvent$

StreamingQueryManager

A class to manage all the StreamingQuery active in a SparkSession.

StreamingQueryProgress

Information about progress made in the execution of a StreamingQuery during a trigger.

StreamingQueryProgressSerializer

StreamingQueryStatus

Reports information about the instantaneous status of a streaming query.

StreamingReadOptions

Options for a streaming read of an input.

StreamingStatistics

StreamingTableWrite

A `StreamingFlowExecution` that writes a streaming `DataFrame` to a `Table`.

StreamingTest

Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs.

StreamingTestMethod

Significance testing methods for StreamingTest.

StreamingWrite

An interface that defines how to write the data to data source in streaming queries.

StreamInputInfo

:: DeveloperApi :: Track the information of input stream at specified batch time.

StreamListener

A streaming listener that converts streaming events into pipeline events for the relevant flows.

StreamSinkProvider

::Experimental:: Implemented by objects that can produce a streaming Sink for a specific format or system.

StreamSourceProvider

::Experimental:: Implemented by objects that can produce a streaming Source for a specific format or system.

StringArrayParam

Specialized version of Param[Array[String} for Java.

StringConstraint

StringContains

A filter that evaluates to true iff the attribute evaluates to a string that contains the string value.

StringEndsWith

A filter that evaluates to true iff the attribute evaluates to a string that ends with value.

StringHelper

StringIndexer

A label indexer that maps string column(s) of labels to ML column(s) of label indices.

StringIndexerBase

Base trait for StringIndexer and StringIndexerModel.

StringIndexerModel

Model fitted by StringIndexer.

StringIndexerModel.Data$

StringRRDD<T>

An RDD that stores R objects as Array[String].

StringStartsWith

A filter that evaluates to true iff the attribute evaluates to a string that starts with value.

StringSubstitutor

StringType

The data type representing String values.

StringTypeExpression

StronglyConnectedComponents

Strongly connected components algorithm implementation.

StructField

A field inside a StructType.

StructType

A StructType object can be constructed by

StudentTTest

Performs Students's 2-sample t-test.

Success

:: DeveloperApi :: Task succeeded.

Sum

An aggregate function that returns the summation of all the values in a group.

Summarizer

Tools for vectorized statistics on MLlib Vectors.

Summary

Trait for the Summary All the summaries should extend from this Summary in order to support connect.

SummaryBuilder

A builder object that provides summary statistics about a given column.

SupportsAdmissionControl

A mix-in interface for SparkDataStream streaming sources to signal that they can control the rate of data ingested into the system.

SupportsAtomicPartitionManagement

An atomic partition interface of Table to operate multiple partitions atomically.

SupportsCatalogOptions

An interface, which TableProviders can implement, to support table existence checks and creation through a catalog, without having to use table identifiers.

SupportsDelete

A mix-in interface for Table delete support.

SupportsDeleteV2

A mix-in interface for Table delete support.

SupportsDelta

A mix-in interface for RowLevelOperation.

SupportsDynamicOverwrite

Write builder trait for tables that support dynamic partition overwrite.

SupportsIndex

Table methods for working with index

SupportsMetadataColumns

An interface for exposing data columns for a table that are not in the table schema.

SupportsNamespaces

Catalog methods for working with namespaces.

SupportsOverwrite

Write builder trait for tables that support overwrite by filter.

SupportsOverwriteV2

Write builder trait for tables that support overwrite by filter.

SupportsPartitionManagement

A partition interface of Table.

SupportsPushDownAggregates