pyspark.sql.DataFrame.dropna#
- DataFrame.dropna(how='any', thresh=None, subset=None)[source]#
Returns a new
DataFrame
omitting rows with null or NaN values.DataFrame.dropna()
andDataFrameNaFunctions.drop()
are aliases of each other.New in version 1.3.1.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- howstr, optional, the values that can be ‘any’ or ‘all’, default ‘any’.
If ‘any’, drop a row if it contains any nulls. If ‘all’, drop a row only if all its values are null.
- thresh: int, optional, default None.
If specified, drop rows that have less than thresh non-null values. This overwrites the how parameter.
- subsetstr, tuple or list, optional
optional list of column names to consider.
- Returns
DataFrame
DataFrame with null only rows excluded.
Examples
>>> from pyspark.sql import Row >>> df = spark.createDataFrame([ ... Row(age=10, height=80.0, name="Alice"), ... Row(age=5, height=float("nan"), name="Bob"), ... Row(age=None, height=None, name="Tom"), ... Row(age=None, height=float("nan"), name=None), ... ])
Example 1: Drop the row if it contains any null or NaN.
>>> df.na.drop().show() +---+------+-----+ |age|height| name| +---+------+-----+ | 10| 80.0|Alice| +---+------+-----+
Example 2: Drop the row only if all its values are null or NaN.
>>> df.na.drop(how='all').show() +----+------+-----+ | age|height| name| +----+------+-----+ | 10| 80.0|Alice| | 5| NaN| Bob| |NULL| NULL| Tom| +----+------+-----+
Example 3: Drop rows that have less than thresh non-null and non-NaN values.
>>> df.na.drop(thresh=2).show() +---+------+-----+ |age|height| name| +---+------+-----+ | 10| 80.0|Alice| | 5| NaN| Bob| +---+------+-----+
Example 4: Drop rows with null and NaN values in the specified columns.
>>> df.na.drop(subset=['age', 'name']).show() +---+------+-----+ |age|height| name| +---+------+-----+ | 10| 80.0|Alice| | 5| NaN| Bob| +---+------+-----+