pyspark.sql.functions.from_json#

pyspark.sql.functions.from_json(col, schema, options=None)[source]#

Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema. Returns null, in the case of an unparsable string.

New in version 2.1.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

colColumn or str: a column or column name in JSON format
schemaDataType or str: a StructType, ArrayType of StructType or Python string literal with a DDL-formatted string to use when parsing the json column
optionsdict, optional: options to control parsing. accepts the same options as the json datasource. See Data Source Option for the version you use.

Returns

Column: a new column of complex type from given JSON object.

Examples

Example 1: Parsing JSON with a specified schema

>>> import pyspark.sql.functions as sf
>>> from pyspark.sql.types import StructType, StructField, IntegerType
>>> schema = StructType([StructField("a", IntegerType())])
>>> df = spark.createDataFrame([(1, '''{"a": 1}''')], ("key", "value"))
>>> df.select(sf.from_json(df.value, schema).alias("json")).show()
+----+
|json|
+----+
| {1}|
+----+

Example 2: Parsing JSON with a DDL-formatted string.

>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame([(1, '''{"a": 1}''')], ("key", "value"))
>>> df.select(sf.from_json(df.value, "a INT").alias("json")).show()
+----+
|json|
+----+
| {1}|
+----+

Example 3: Parsing JSON into a MapType

>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame([(1, '''{"a": 1}''')], ("key", "value"))
>>> df.select(sf.from_json(df.value, "MAP<STRING,INT>").alias("json")).show()
+--------+
|    json|
+--------+
|{a -> 1}|
+--------+

Example 4: Parsing JSON into an ArrayType of StructType

>>> import pyspark.sql.functions as sf
>>> from pyspark.sql.types import ArrayType, StructType, StructField, IntegerType
>>> schema = ArrayType(StructType([StructField("a", IntegerType())]))
>>> df = spark.createDataFrame([(1, '''[{"a": 1}]''')], ("key", "value"))
>>> df.select(sf.from_json(df.value, schema).alias("json")).show()
+-----+
| json|
+-----+
|[{1}]|
+-----+

Example 5: Parsing JSON into an ArrayType

>>> import pyspark.sql.functions as sf
>>> from pyspark.sql.types import ArrayType, IntegerType
>>> schema = ArrayType(IntegerType())
>>> df = spark.createDataFrame([(1, '''[1, 2, 3]''')], ("key", "value"))
>>> df.select(sf.from_json(df.value, schema).alias("json")).show()
+---------+
|     json|
+---------+
|[1, 2, 3]|
+---------+

Example 6: Parsing JSON with specified options

>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame([(1, '''{a:123}'''), (2, '''{"a":456}''')], ("key", "value"))
>>> parsed1 = sf.from_json(df.value, "a INT")
>>> parsed2 = sf.from_json(df.value, "a INT", {"allowUnquotedFieldNames": "true"})
>>> df.select("value", parsed1, parsed2).show()
+---------+----------------+----------------+
|    value|from_json(value)|from_json(value)|
+---------+----------------+----------------+
|  {a:123}|          {NULL}|           {123}|
|{"a":456}|           {456}|           {456}|
+---------+----------------+----------------+