pyspark.pandas.DataFrame.rank#
- DataFrame.rank(method='average', ascending=True, numeric_only=False)[source]#
- Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values. - Note - the current implementation of rank uses Spark’s Window without specifying partition specification. This leads to moving all data into a single partition in a single machine and could cause serious performance degradation. Avoid this method with very large datasets. - Parameters
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}
- average: average rank of group 
- min: lowest rank in group 
- max: highest rank in group 
- first: ranks assigned in order they appear in the array 
- dense: like ‘min’, but rank always increases by 1 between groups 
 
- ascendingboolean, default True
- False for ranks by high (1) to low (N) 
- numeric_onlybool, default False
- For DataFrame objects, rank only numeric columns if set to True. - Changed in version 4.0.0: The default value of - numeric_onlyis now- False.
 
- Returns
- rankssame type as caller
 
 - Examples - >>> df = ps.DataFrame({'A': [1, 2, 2, 3], 'B': [4, 3, 2, 1]}, columns=['A', 'B']) >>> df A B 0 1 4 1 2 3 2 2 2 3 3 1 - >>> df.rank().sort_index() A B 0 1.0 4.0 1 2.5 3.0 2 2.5 2.0 3 4.0 1.0 - If method is set to ‘min’, it uses lowest rank in group. - >>> df.rank(method='min').sort_index() A B 0 1.0 4.0 1 2.0 3.0 2 2.0 2.0 3 4.0 1.0 - If method is set to ‘max’, it uses highest rank in group. - >>> df.rank(method='max').sort_index() A B 0 1.0 4.0 1 3.0 3.0 2 3.0 2.0 3 4.0 1.0 - If method is set to ‘dense’, it leaves no gaps in group. - >>> df.rank(method='dense').sort_index() A B 0 1.0 4.0 1 2.0 3.0 2 2.0 2.0 3 3.0 1.0 - If numeric_only is set to ‘True’, rank only numeric columns. - >>> df = ps.DataFrame({'A': [1, 2, 2, 3], 'B': ['a', 'b', 'd', 'c']}, columns= ['A', 'B']) >>> df A B 0 1 a 1 2 b 2 2 d 3 3 c >>> df.rank(numeric_only=True) A 0 1.0 1 2.5 2 2.5 3 4.0