pyspark.pandas.groupby.GroupBy.prod#

GroupBy.prod(numeric_only=False, min_count=0)[source]#

Compute prod of groups.

New in version 3.4.0.

Parameters
numeric_onlybool, default False

Include only float, int, boolean columns.

Changed in version 4.0.0.

min_countint, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

Returns
Series or DataFrame

Computed prod of values within each group.

Examples

>>> import numpy as np
>>> df = ps.DataFrame(
...     {
...         "A": [1, 1, 2, 1, 2],
...         "B": [np.nan, 2, 3, 4, 5],
...         "C": [1, 2, 1, 1, 2],
...         "D": [True, False, True, False, True],
...     }
... )

Groupby one column and return the prod of the remaining columns in each group.

>>> df.groupby('A').prod().sort_index()
     B  C  D
A
1  8.0  2  0
2  15.0 2  1
>>> df.groupby('A').prod(min_count=3).sort_index()
     B  C   D
A
1  NaN  2.0  0.0
2  NaN NaN  NaN