Legacy Entry Points#

SQLContext was the primary entry point for Spark SQL in Spark 1.x. As of Spark 2.0, it has been replaced by SparkSession. These classes are retained for backward compatibility only.

Deprecated since version 3.0.0: Use SparkSession.builder.getOrCreate() instead.

Note

Under Spark Connect, SQLContext.registerJavaFunction() and the whole HiveContext are not supported and raise PySparkNotImplementedError, since they rely on a JVM SparkContext that does not exist in Connect mode.

SQLContext#

SQLContext(sparkContext[, sparkSession, ...])

The entry point for working with structured data (rows and columns) in Spark, in Spark 1.x.

SQLContext.getOrCreate([sc])

Get the existing SQLContext or create a new one with given SparkContext.

SQLContext.newSession()

Returns a new SQLContext as new session, that has separate SQLConf, registered temporary views and UDFs, but shared SparkContext and table cache.

SQLContext.setConf(key, value)

Sets the given Spark SQL configuration property.

SQLContext.getConf(key[, defaultValue])

Returns the value of Spark SQL configuration property for the given key.

SQLContext.udf

Returns a UDFRegistration for UDF registration.

SQLContext.udtf

Returns a UDTFRegistration for UDTF registration.

SQLContext.range(start[, end, step, ...])

Create a DataFrame with single pyspark.sql.types.LongType column named id, containing elements in a range from start to end (exclusive) with step value step.

SQLContext.registerFunction(name, f[, ...])

An alias for spark.udf.register().

SQLContext.registerJavaFunction(name, ...[, ...])

An alias for spark.udf.registerJavaFunction().

SQLContext.createDataFrame(data[, schema, ...])

Creates a DataFrame from an RDD, a list, a pandas.DataFrame, or a pyarrow.Table.

SQLContext.registerDataFrameAsTable(df, ...)

Registers the given DataFrame as a temporary table in the catalog.

SQLContext.dropTempTable(tableName)

Remove the temporary table from catalog.

SQLContext.createExternalTable(tableName[, ...])

Creates an external table based on the dataset in a data source.

SQLContext.sql(sqlQuery)

Returns a DataFrame representing the result of the given query.

SQLContext.table(tableName)

Returns the specified table or view as a DataFrame.

SQLContext.tables([dbName])

Returns a DataFrame containing names of tables in the given database.

SQLContext.tableNames([dbName])

Returns a list of names of tables in the database dbName.

SQLContext.cacheTable(tableName)

Caches the specified table in-memory.

SQLContext.uncacheTable(tableName)

Removes the specified table from the in-memory cache.

SQLContext.clearCache()

Removes all cached tables from the in-memory cache.

SQLContext.read

Returns a DataFrameReader that can be used to read data in as a DataFrame.

SQLContext.readStream

Returns a DataStreamReader that can be used to read data streams as a streaming DataFrame.

SQLContext.streams

Returns a StreamingQueryManager that allows managing all the StreamingQuery StreamingQueries active on this context.

HiveContext#

HiveContext(sparkContext[, sparkSession, ...])

A variant of Spark SQL that integrates with data stored in Hive.