pyspark.sql.functions.from_json#

pyspark.sql.functions.from_json(col, schema, options=None)[source]#

Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema. Returns null, in the case of an unparsable string.

New in version 2.1.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

a column or column name in JSON format

schemaDataType or str

a StructType, ArrayType of StructType or Python string literal with a DDL-formatted string to use when parsing the json column

optionsdict, optional

options to control parsing. accepts the same options as the json datasource. See Data Source Option for the version you use.

Returns
Column

a new column of complex type from given JSON object.

Examples

Example 1: Parsing JSON with a specified schema

>>> import pyspark.sql.functions as sf
>>> from pyspark.sql.types import StructType, StructField, IntegerType
>>> schema = StructType([StructField("a", IntegerType())])
>>> df = spark.createDataFrame([(1, '''{"a": 1}''')], ("key", "value"))
>>> df.select(sf.from_json(df.value, schema).alias("json")).show()
+----+
|json|
+----+
| {1}|
+----+

Example 2: Parsing JSON with a DDL-formatted string.

>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame([(1, '''{"a": 1}''')], ("key", "value"))
>>> df.select(sf.from_json(df.value, "a INT").alias("json")).show()
+----+
|json|
+----+
| {1}|
+----+

Example 3: Parsing JSON into a MapType

>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame([(1, '''{"a": 1}''')], ("key", "value"))
>>> df.select(sf.from_json(df.value, "MAP<STRING,INT>").alias("json")).show()
+--------+
|    json|
+--------+
|{a -> 1}|
+--------+

Example 4: Parsing JSON into an ArrayType of StructType

>>> import pyspark.sql.functions as sf
>>> from pyspark.sql.types import ArrayType, StructType, StructField, IntegerType
>>> schema = ArrayType(StructType([StructField("a", IntegerType())]))
>>> df = spark.createDataFrame([(1, '''[{"a": 1}]''')], ("key", "value"))
>>> df.select(sf.from_json(df.value, schema).alias("json")).show()
+-----+
| json|
+-----+
|[{1}]|
+-----+

Example 5: Parsing JSON into an ArrayType

>>> import pyspark.sql.functions as sf
>>> from pyspark.sql.types import ArrayType, IntegerType
>>> schema = ArrayType(IntegerType())
>>> df = spark.createDataFrame([(1, '''[1, 2, 3]''')], ("key", "value"))
>>> df.select(sf.from_json(df.value, schema).alias("json")).show()
+---------+
|     json|
+---------+
|[1, 2, 3]|
+---------+

Example 6: Parsing JSON with specified options

>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame([(1, '''{a:123}'''), (2, '''{"a":456}''')], ("key", "value"))
>>> parsed1 = sf.from_json(df.value, "a INT")
>>> parsed2 = sf.from_json(df.value, "a INT", {"allowUnquotedFieldNames": "true"})
>>> df.select("value", parsed1, parsed2).show()
+---------+----------------+----------------+
|    value|from_json(value)|from_json(value)|
+---------+----------------+----------------+
|  {a:123}|          {NULL}|           {123}|
|{"a":456}|           {456}|           {456}|
+---------+----------------+----------------+