pyspark.pandas.DataFrame.nlargest#
- DataFrame.nlargest(n, columns, keep='first')[source]#
- Return the first n rows ordered by columns in descending order. - Return the first n rows with the largest values in columns, in descending order. The columns that are not specified are returned as well, but not used for ordering. - This method is equivalent to - df.sort_values(columns, ascending=False).head(n), but more performant in pandas. In pandas-on-Spark, thanks to Spark’s lazy execution and query optimizer, the two would have same performance.- Parameters
- nint
- Number of rows to return. 
- columnslabel or list of labels
- Column label(s) to order by. 
- keep{‘first’, ‘last’}, default ‘first’. ‘all’ is not implemented yet.
- Determines which duplicates (if any) to keep. - - first: Keep the first occurrence. -- last: Keep the last occurrence.
 
- Returns
- DataFrame
- The first n rows ordered by the given columns in descending order. 
 
 - See also - DataFrame.nsmallest
- Return the first n rows ordered by columns in ascending order. 
- DataFrame.sort_values
- Sort DataFrame by the values. 
- DataFrame.head
- Return the first n rows without re-ordering. 
 - Notes - This function cannot be used with all column types. For example, when specifying columns with object or category dtypes, - TypeErroris raised.- Examples - >>> df = ps.DataFrame({'X': [1, 2, 3, 5, 6, 7, np.nan], ... 'Y': [6, 7, 8, 9, 10, 11, 12]}) >>> df X Y 0 1.0 6 1 2.0 7 2 3.0 8 3 5.0 9 4 6.0 10 5 7.0 11 6 NaN 12 - In the following example, we will use - nlargestto select the three rows having the largest values in column “X”.- >>> df.nlargest(n=3, columns='X') X Y 5 7.0 11 4 6.0 10 3 5.0 9 - To order by the largest values in column “Y” and then “X”, we can specify multiple columns like in the next example. - >>> df.nlargest(n=3, columns=['Y', 'X']) X Y 6 NaN 12 5 7.0 11 4 6.0 10 - The examples below show how ties are resolved, which is decided by keep. - >>> tied_df = ps.DataFrame({'X': [1, 2, 2, 3, 3]}, index=['a', 'b', 'c', 'd', 'e']) >>> tied_df X a 1 b 2 c 2 d 3 e 3 - When using keep=’first’ (default), ties are resolved in order: - >>> tied_df.nlargest(3, 'X') X d 3 e 3 b 2 - >>> tied_df.nlargest(3, 'X', keep='first') X d 3 e 3 b 2 - When using keep=’last’, ties are resolved in reverse order: - >>> tied_df.nlargest(3, 'X', keep='last') X e 3 d 3 c 2