pyspark.sql.table_arg.TableArg.orderBy#
- TableArg.orderBy(*cols)[source]#
Orders the data within each partition by the specified columns.
This method orders the data within partitions. It must be called after partitionBy() or withSinglePartition() has been called.
- Parameters
- colsstr,
Column, or list Column names or
Columnobjects to order by. Columns can be ordered in ascending or descending order usingColumn.asc()orColumn.desc().
- colsstr,
- Returns
TableArgA new TableArg instance with ordering applied.
Examples
>>> from pyspark.sql.functions import udtf >>> >>> @udtf(returnType="key: int, value: string") ... class ProcessUDTF: ... def eval(self, row): ... yield row["key"], row["value"] ... >>> df = spark.createDataFrame( ... [(1, "b"), (1, "a"), (2, "d"), (2, "c")], ["key", "value"] ... ) >>> >>> # Order by a single column within partitions >>> result = ProcessUDTF(df.asTable().partitionBy("key").orderBy("value")) >>> result.show() +---+-----+ |key|value| +---+-----+ | 1| a| | 1| b| | 2| c| | 2| d| +---+-----+ >>> >>> # Order by multiple columns >>> df2 = spark.createDataFrame( ... [(1, "a", 2), (1, "a", 1), (1, "b", 3)], ["key", "value", "num"] ... ) >>> result2 = ProcessUDTF(df2.asTable().partitionBy("key").orderBy("value", "num")) >>> result2.show() +---+-----+ |key|value| +---+-----+ | 1| a| | 1| a| | 1| b| +---+-----+ >>> >>> # Order by descending order >>> result3 = ProcessUDTF(df.asTable().partitionBy("key").orderBy(df.value.desc())) >>> result3.show() +---+-----+ |key|value| +---+-----+ | 1| b| | 1| a| | 2| d| | 2| c| +---+-----+