pyspark.pandas.DataFrame.loc#
- property DataFrame.loc#
- Access a group of rows and columns by label(s) or a boolean Series. - .loc[]is primarily label based, but may also be used with a conditional boolean Series derived from the DataFrame or Series.- Allowed inputs are: - A single label, e.g. - 5or- 'a', (note that- 5is interpreted as a label of the index, and never as an integer position along the index) for column selection.
- A list or array of labels, e.g. - ['a', 'b', 'c'].
- A slice object with labels, e.g. - 'a':'f'.
- A conditional boolean Series derived from the DataFrame or Series 
- A boolean array of the same length as the column axis being sliced, e.g. - [True, False, True].
- An alignable boolean pandas Series to the column axis being sliced. The index of the key will be aligned before masking. 
 - Not allowed inputs which pandas allows are: - A boolean array of the same length as the row axis being sliced, e.g. - [True, False, True].
- A - callablefunction with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)
 - Note - MultiIndex is not supported yet. - Note - Note that contrary to usual python slices, both the start and the stop are included, and the step of the slice is not allowed. - Note - With a list or array of labels for row selection, pandas-on-Spark behaves as a filter without reordering by the labels. - See also - Series.loc
- Access group of values using labels. 
 - Examples - Getting values - >>> df = ps.DataFrame([[1, 2], [4, 5], [7, 8]], ... index=['cobra', 'viper', 'sidewinder'], ... columns=['max_speed', 'shield']) >>> df max_speed shield cobra 1 2 viper 4 5 sidewinder 7 8 - Single label. Note this returns the row as a Series. - >>> df.loc['viper'] max_speed 4 shield 5 Name: viper, dtype: int64 - List of labels. Note using - [[]]returns a DataFrame. Also note that pandas-on-Spark behaves just a filter without reordering by the labels.- >>> df.loc[['viper', 'sidewinder']] max_speed shield viper 4 5 sidewinder 7 8 - >>> df.loc[['sidewinder', 'viper']] max_speed shield viper 4 5 sidewinder 7 8 - Single label for column. - >>> int(df.loc['cobra', 'shield']) 2 - List of labels for row. - >>> df.loc[['cobra'], 'shield'] cobra 2 Name: shield, dtype: int64 - List of labels for column. - >>> df.loc['cobra', ['shield']] shield 2 Name: cobra, dtype: int64 - List of labels for both row and column. - >>> df.loc[['cobra'], ['shield']] shield cobra 2 - Slice with labels for row and single label for column. Note that both the start and stop of the slice are included. - >>> df.loc['cobra':'viper', 'max_speed'] cobra 1 viper 4 Name: max_speed, dtype: int64 - Conditional that returns a boolean Series - >>> df.loc[df['shield'] > 6] max_speed shield sidewinder 7 8 - Conditional that returns a boolean Series with column labels specified - >>> df.loc[df['shield'] > 6, ['max_speed']] max_speed sidewinder 7 - A boolean array of the same length as the column axis being sliced. - >>> df.loc[:, [False, True]] shield cobra 2 viper 5 sidewinder 8 - An alignable boolean Series to the column axis being sliced. - >>> df.loc[:, pd.Series([False, True], index=['max_speed', 'shield'])] shield cobra 2 viper 5 sidewinder 8 - Setting values - Setting value for all items matching the list of labels. - >>> df.loc[['viper', 'sidewinder'], ['shield']] = 50 >>> df max_speed shield cobra 1 2 viper 4 50 sidewinder 7 50 - Setting value for an entire row - >>> df.loc['cobra'] = 10 >>> df max_speed shield cobra 10 10 viper 4 50 sidewinder 7 50 - Set value for an entire column - >>> df.loc[:, 'max_speed'] = 30 >>> df max_speed shield cobra 30 10 viper 30 50 sidewinder 30 50 - Set value for an entire list of columns - >>> df.loc[:, ['max_speed', 'shield']] = 100 >>> df max_speed shield cobra 100 100 viper 100 100 sidewinder 100 100 - Set value with Series - >>> df.loc[:, 'shield'] = df['shield'] * 2 >>> df max_speed shield cobra 100 200 viper 100 200 sidewinder 100 200 - Getting values on a DataFrame with an index that has integer labels - Another example using integers for the index - >>> df = ps.DataFrame([[1, 2], [4, 5], [7, 8]], ... index=[7, 8, 9], ... columns=['max_speed', 'shield']) >>> df max_speed shield 7 1 2 8 4 5 9 7 8 - Slice with integer labels for rows. Note that both the start and stop of the slice are included. - >>> df.loc[7:9] max_speed shield 7 1 2 8 4 5 9 7 8