pyspark.sql.table_arg.TableArg.orderBy#

TableArg.orderBy(*cols)[source]#

Orders the data within each partition by the specified columns.

This method orders the data within partitions. It must be called after partitionBy() or withSinglePartition() has been called.

Parameters
colsstr, Column, or list

Column names or Column objects to order by. Columns can be ordered in ascending or descending order using Column.asc() or Column.desc().

Returns
TableArg

A new TableArg instance with ordering applied.

Examples

>>> from pyspark.sql.functions import udtf
>>>
>>> @udtf(returnType="key: int, value: string")
... class ProcessUDTF:
...     def eval(self, row):
...         yield row["key"], row["value"]
...
>>> df = spark.createDataFrame(
...     [(1, "b"), (1, "a"), (2, "d"), (2, "c")], ["key", "value"]
... )
>>>
>>> # Order by a single column within partitions
>>> result = ProcessUDTF(df.asTable().partitionBy("key").orderBy("value"))
>>> result.show()
+---+-----+
|key|value|
+---+-----+
|  1|    a|
|  1|    b|
|  2|    c|
|  2|    d|
+---+-----+
>>>
>>> # Order by multiple columns
>>> df2 = spark.createDataFrame(
...     [(1, "a", 2), (1, "a", 1), (1, "b", 3)], ["key", "value", "num"]
... )
>>> result2 = ProcessUDTF(df2.asTable().partitionBy("key").orderBy("value", "num"))
>>> result2.show()
+---+-----+
|key|value|
+---+-----+
|  1|    a|
|  1|    a|
|  1|    b|
+---+-----+
>>>
>>> # Order by descending order
>>> result3 = ProcessUDTF(df.asTable().partitionBy("key").orderBy(df.value.desc()))
>>> result3.show()
+---+-----+
|key|value|
+---+-----+
|  1|    b|
|  1|    a|
|  2|    d|
|  2|    c|
+---+-----+