Legacy Entry Points#

SQLContext was the primary entry point for Spark SQL in Spark 1.x. As of Spark 2.0, it has been replaced by SparkSession. These classes are retained for backward compatibility only.

Deprecated since version 3.0.0: Use SparkSession.builder.getOrCreate() instead.

Note

Under Spark Connect, SQLContext.registerJavaFunction() and the whole HiveContext are not supported and raise PySparkNotImplementedError, since they rely on a JVM SparkContext that does not exist in Connect mode.

SQLContext#

SQLContext(sparkContext[, sparkSession, ...])

The entry point for working with structured data (rows and columns) in Spark, in Spark 1.x.

`SQLContext.getOrCreate`([sc])	Get the existing SQLContext or create a new one with given SparkContext.
`SQLContext.newSession`()	Returns a new SQLContext as new session, that has separate SQLConf, registered temporary views and UDFs, but shared SparkContext and table cache.
`SQLContext.setConf`(key, value)	Sets the given Spark SQL configuration property.
`SQLContext.getConf`(key[, defaultValue])	Returns the value of Spark SQL configuration property for the given key.
`SQLContext.udf`	Returns a `UDFRegistration` for UDF registration.
`SQLContext.udtf`	Returns a `UDTFRegistration` for UDTF registration.
`SQLContext.range`(start[, end, step, ...])	Create a `DataFrame` with single `pyspark.sql.types.LongType` column named `id`, containing elements in a range from `start` to `end` (exclusive) with step value `step`.
`SQLContext.registerFunction`(name, f[, ...])	An alias for `spark.udf.register()`.
`SQLContext.registerJavaFunction`(name, ...[, ...])	An alias for `spark.udf.registerJavaFunction()`.
`SQLContext.createDataFrame`(data[, schema, ...])	Creates a `DataFrame` from an `RDD`, a list, a `pandas.DataFrame`, or a `pyarrow.Table`.
`SQLContext.registerDataFrameAsTable`(df, ...)	Registers the given `DataFrame` as a temporary table in the catalog.
`SQLContext.dropTempTable`(tableName)	Remove the temporary table from catalog.
`SQLContext.createExternalTable`(tableName[, ...])	Creates an external table based on the dataset in a data source.
`SQLContext.sql`(sqlQuery)	Returns a `DataFrame` representing the result of the given query.
`SQLContext.table`(tableName)	Returns the specified table or view as a `DataFrame`.
`SQLContext.tables`([dbName])	Returns a `DataFrame` containing names of tables in the given database.
`SQLContext.tableNames`([dbName])	Returns a list of names of tables in the database `dbName`.
`SQLContext.cacheTable`(tableName)	Caches the specified table in-memory.
`SQLContext.uncacheTable`(tableName)	Removes the specified table from the in-memory cache.
`SQLContext.clearCache`()	Removes all cached tables from the in-memory cache.
`SQLContext.read`	Returns a `DataFrameReader` that can be used to read data in as a `DataFrame`.
`SQLContext.readStream`	Returns a `DataStreamReader` that can be used to read data streams as a streaming `DataFrame`.
`SQLContext.streams`	Returns a `StreamingQueryManager` that allows managing all the `StreamingQuery` StreamingQueries active on this context.

HiveContext#

HiveContext(sparkContext[, sparkSession, ...])

A variant of Spark SQL that integrates with data stored in Hive.