===================== Supported pandas API ===================== .. currentmodule:: pyspark.pandas The following table shows the pandas APIs that implemented or non-implemented from pandas API on Spark. Some pandas API do not implement full parameters, so the third column shows missing parameters for each API. * 'Y' in the second column means it's implemented including its whole parameter. * 'N' means it's not implemented yet. * 'P' means it's partially implemented with the missing of some parameters. All API in the list below computes the data with distributed execution except the ones that require the local execution by design. For example, `DataFrame.to_numpy() `__ requires to collect the data to the driver side. If there is non-implemented pandas API or parameter you want, you can create an `Apache Spark JIRA `__ to request or to contribute by your own. The API list is updated based on the `latest pandas official API reference `__. CategoricalIndex API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.CategoricalIndex .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - :func:`add_categories` - Y - * - :func:`all` - Y - * - :func:`any` - Y - * - :func:`append` - Y - * - :func:`argmax` - P - ``axis`` , ``skipna`` * - :func:`argmin` - P - ``axis`` , ``skipna`` * - argsort - N - * - :func:`as_ordered` - Y - * - :func:`as_unordered` - Y - * - :func:`asof` - Y - * - asof_locs - N - * - :func:`astype` - P - ``copy`` * - :func:`copy` - Y - * - :func:`delete` - Y - * - diff - N - * - :func:`difference` - Y - * - :func:`drop` - P - ``errors`` * - :func:`drop_duplicates` - Y - * - :func:`droplevel` - Y - * - :func:`dropna` - Y - * - duplicated - N - * - :func:`equals` - Y - * - :func:`factorize` - Y - * - :func:`fillna` - P - ``downcast`` * - format - N - * - get_indexer - N - * - get_indexer_for - N - * - get_indexer_non_unique - N - * - :func:`get_level_values` - Y - * - get_loc - N - * - get_slice_bound - N - * - groupby - N - * - :func:`holds_integer` - Y - * - :func:`identical` - Y - * - infer_objects - N - * - :func:`insert` - Y - * - :func:`intersection` - P - ``sort`` * - is\_ - N - * - :func:`is_boolean` - Y - * - :func:`is_categorical` - Y - * - :func:`is_floating` - Y - * - :func:`is_integer` - Y - * - :func:`is_interval` - Y - * - :func:`is_numeric` - Y - * - :func:`is_object` - Y - * - :func:`isin` - P - ``level`` * - :func:`isna` - Y - * - :func:`isnull` - Y - * - :func:`item` - Y - * - join - N - * - :func:`map` - P - ``na_action`` * - :func:`max` - Y - * - memory_usage - N - * - :func:`min` - Y - * - :func:`notna` - Y - * - :func:`notnull` - Y - * - :func:`nunique` - Y - * - putmask - N - * - ravel - N - * - reindex - N - * - :func:`remove_categories` - Y - * - :func:`remove_unused_categories` - Y - * - :func:`rename` - Y - * - :func:`rename_categories` - Y - * - :func:`reorder_categories` - Y - * - :func:`repeat` - P - ``axis`` * - round - N - * - searchsorted - N - * - :func:`set_categories` - Y - * - :func:`set_names` - Y - * - :func:`shift` - P - ``freq`` * - slice_indexer - N - * - slice_locs - N - * - :func:`sort` - Y - * - :func:`sort_values` - P - ``key`` , ``na_position`` * - sortlevel - N - * - :func:`symmetric_difference` - Y - * - :func:`take` - P - ``allow_fill`` , ``axis`` , ``fill_value`` * - to_flat_index - N - * - :func:`to_frame` - Y - * - :func:`to_list` - Y - * - :func:`to_numpy` - P - ``na_value`` * - :func:`to_series` - P - ``index`` * - :func:`tolist` - Y - * - :func:`transpose` - Y - * - :func:`union` - Y - * - :func:`unique` - Y - * - :func:`value_counts` - Y - * - :func:`view` - Y - * - where - N - DataFrame API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.DataFrame .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - :func:`abs` - Y - * - :func:`add` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`add_prefix` - P - ``axis`` * - :func:`add_suffix` - P - ``axis`` * - :func:`agg` - P - ``axis`` * - :func:`aggregate` - P - ``axis`` * - :func:`align` - P - ``broadcast_axis`` , ``fill_axis`` , ``fill_value`` , ``level`` , ``limit`` and more. See the `pandas.DataFrame.align `__ and `pyspark.pandas.DataFrame.align `__ for detail. * - :func:`all` - Y - * - :func:`any` - P - ``skipna`` * - :func:`apply` - P - ``by_row`` , ``engine`` , ``engine_kwargs`` , ``raw`` , ``result_type`` * - :func:`applymap` - P - ``na_action`` * - asfreq - N - * - asof - N - * - :func:`assign` - Y - * - :func:`astype` - P - ``copy`` , ``errors`` * - :func:`at_time` - Y - * - :func:`backfill` - P - ``downcast`` * - :func:`between_time` - Y - * - :func:`bfill` - P - ``downcast`` , ``limit_area`` * - :func:`bool` - Y - * - :func:`boxplot` - P - ``ax`` , ``backend`` , ``by`` , ``column`` , ``figsize`` and more. See the `pandas.DataFrame.boxplot `__ and `pyspark.pandas.DataFrame.boxplot `__ for detail. * - :func:`clip` - P - ``axis`` , ``inplace`` * - combine - N - * - :func:`combine_first` - Y - * - compare - N - * - convert_dtypes - N - * - :func:`copy` - Y - * - :func:`corr` - P - ``numeric_only`` * - :func:`corrwith` - P - ``numeric_only`` * - :func:`count` - Y - * - :func:`cov` - P - ``numeric_only`` * - :func:`cummax` - P - ``axis`` * - :func:`cummin` - P - ``axis`` * - :func:`cumprod` - P - ``axis`` * - :func:`cumsum` - P - ``axis`` * - :func:`describe` - P - ``exclude`` , ``include`` * - :func:`diff` - Y - * - :func:`div` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`divide` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`dot` - Y - * - :func:`drop` - P - ``errors`` , ``inplace`` , ``level`` * - :func:`drop_duplicates` - Y - * - :func:`droplevel` - Y - * - :func:`dropna` - P - ``ignore_index`` * - :func:`duplicated` - Y - * - :func:`eq` - P - ``axis`` , ``level`` * - :func:`equals` - Y - * - :func:`eval` - Y - * - :func:`ewm` - P - ``adjust`` , ``axis`` , ``method`` , ``times`` * - :func:`expanding` - P - ``axis`` , ``method`` * - :func:`explode` - Y - * - :func:`ffill` - P - ``downcast`` , ``limit_area`` * - :func:`fillna` - P - ``downcast`` * - :func:`filter` - Y - * - :func:`first` - Y - * - :func:`first_valid_index` - Y - * - :func:`floordiv` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`ge` - P - ``axis`` , ``level`` * - :func:`get` - Y - * - :func:`groupby` - P - ``group_keys`` , ``level`` , ``observed`` , ``sort`` * - :func:`gt` - P - ``axis`` , ``level`` * - :func:`head` - Y - * - :func:`hist` - P - ``ax`` , ``backend`` , ``by`` , ``column`` , ``data`` and more. See the `pandas.DataFrame.hist `__ and `pyspark.pandas.DataFrame.hist `__ for detail. * - :func:`idxmax` - P - ``numeric_only`` , ``skipna`` * - :func:`idxmin` - P - ``numeric_only`` , ``skipna`` * - infer_objects - N - * - :func:`info` - P - ``memory_usage`` * - :func:`insert` - Y - * - :func:`interpolate` - P - ``axis`` , ``downcast`` , ``inplace`` * - isetitem - N - * - :func:`isin` - Y - * - :func:`isna` - Y - * - :func:`isnull` - Y - * - :func:`items` - Y - * - :func:`iterrows` - Y - * - :func:`itertuples` - Y - * - :func:`join` - P - ``other`` , ``sort`` , ``validate`` * - :func:`keys` - Y - * - :func:`kurt` - Y - * - :func:`kurtosis` - Y - * - :func:`last` - Y - * - :func:`last_valid_index` - Y - * - :func:`le` - P - ``axis`` , ``level`` * - :func:`lt` - P - ``axis`` , ``level`` * - :func:`map` - P - ``na_action`` * - :func:`mask` - P - ``axis`` , ``inplace`` , ``level`` * - :func:`max` - Y - * - :func:`mean` - Y - * - :func:`median` - Y - * - :func:`melt` - P - ``col_level`` , ``ignore_index`` * - memory_usage - N - * - :func:`merge` - P - ``copy`` , ``indicator`` , ``sort`` , ``validate`` * - :func:`min` - Y - * - :func:`mod` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`mode` - Y - * - :func:`mul` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`multiply` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`ne` - P - ``axis`` , ``level`` * - :func:`nlargest` - Y - * - :func:`notna` - Y - * - :func:`notnull` - Y - * - :func:`nsmallest` - Y - * - :func:`nunique` - Y - * - :func:`pad` - P - ``downcast`` * - :func:`pct_change` - P - ``fill_method`` , ``freq`` , ``limit`` * - :func:`pipe` - Y - * - :func:`pivot` - Y - * - :func:`pivot_table` - P - ``dropna`` , ``margins`` , ``margins_name`` , ``observed`` , ``sort`` * - :func:`pop` - Y - * - :func:`pow` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`prod` - Y - * - :func:`product` - Y - * - :func:`quantile` - P - ``interpolation`` , ``method`` * - :func:`query` - Y - * - :func:`radd` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`rank` - P - ``axis`` , ``na_option`` , ``pct`` * - :func:`rdiv` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`reindex` - P - ``level`` , ``limit`` , ``method`` , ``tolerance`` * - :func:`reindex_like` - P - ``limit`` , ``method`` , ``tolerance`` * - :func:`rename` - P - ``copy`` * - :func:`rename_axis` - P - ``copy`` * - reorder_levels - N - * - :func:`replace` - Y - * - :func:`resample` - P - ``axis`` , ``convention`` , ``group_keys`` , ``kind`` , ``level`` and more. See the `pandas.DataFrame.resample `__ and `pyspark.pandas.DataFrame.resample `__ for detail. * - :func:`reset_index` - P - ``allow_duplicates`` , ``names`` * - :func:`rfloordiv` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`rmod` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`rmul` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`rolling` - P - ``axis`` , ``center`` , ``closed`` , ``method`` , ``on`` and more. See the `pandas.DataFrame.rolling `__ and `pyspark.pandas.DataFrame.rolling `__ for detail. * - :func:`round` - Y - * - :func:`rpow` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`rsub` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`rtruediv` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`sample` - P - ``axis`` , ``weights`` * - :func:`select_dtypes` - Y - * - :func:`sem` - Y - * - set_axis - N - * - set_flags - N - * - :func:`set_index` - P - ``verify_integrity`` * - :func:`shift` - P - ``axis`` , ``freq`` , ``suffix`` * - :func:`skew` - Y - * - :func:`sort_index` - P - ``key`` , ``sort_remaining`` * - :func:`sort_values` - P - ``axis`` , ``key`` , ``kind`` * - :func:`squeeze` - Y - * - :func:`stack` - P - ``dropna`` , ``future_stack`` , ``level`` , ``sort`` * - :func:`std` - Y - * - :func:`sub` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`subtract` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`sum` - Y - * - :func:`swapaxes` - P - ``axis1`` , ``axis2`` * - :func:`swaplevel` - Y - * - :func:`tail` - Y - * - :func:`take` - Y - * - :func:`to_clipboard` - Y - * - :func:`to_csv` - P - ``chunksize`` , ``compression`` , ``decimal`` , ``doublequote`` , ``encoding`` and more. See the `pandas.DataFrame.to_csv `__ and `pyspark.pandas.DataFrame.to_csv `__ for detail. * - :func:`to_dict` - P - ``index`` * - :func:`to_excel` - P - ``engine_kwargs`` , ``storage_options`` * - :func:`to_feather` - Y - * - to_gbq - N - * - :func:`to_hdf` - Y - * - :func:`to_html` - P - ``encoding`` * - :func:`to_json` - P - ``date_format`` , ``date_unit`` , ``default_handler`` , ``double_precision`` , ``force_ascii`` and more. See the `pandas.DataFrame.to_json `__ and `pyspark.pandas.DataFrame.to_json `__ for detail. * - :func:`to_latex` - P - ``caption`` , ``label`` , ``position`` * - :func:`to_markdown` - P - ``index`` , ``storage_options`` * - :func:`to_numpy` - P - ``copy`` , ``dtype`` , ``na_value`` * - :func:`to_orc` - P - ``engine`` , ``engine_kwargs`` , ``index`` * - :func:`to_parquet` - P - ``engine`` , ``index`` , ``storage_options`` * - to_period - N - * - to_pickle - N - * - :func:`to_records` - Y - * - to_sql - N - * - :func:`to_stata` - Y - * - :func:`to_string` - P - ``encoding`` , ``max_colwidth`` , ``min_rows`` * - to_timestamp - N - * - to_xarray - N - * - to_xml - N - * - :func:`transform` - Y - * - :func:`transpose` - P - ``copy`` * - :func:`truediv` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`truncate` - Y - * - tz_convert - N - * - tz_localize - N - * - :func:`unstack` - P - ``fill_value`` , ``level`` , ``sort`` * - :func:`update` - P - ``errors`` , ``filter_func`` * - value_counts - N - * - :func:`var` - P - ``skipna`` * - :func:`where` - P - ``inplace`` , ``level`` * - :func:`xs` - P - ``drop_level`` DatetimeIndex API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.DatetimeIndex .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - :func:`all` - Y - * - :func:`any` - Y - * - :func:`append` - Y - * - :func:`argmax` - P - ``axis`` , ``skipna`` * - :func:`argmin` - P - ``axis`` , ``skipna`` * - argsort - N - * - as_unit - N - * - :func:`asof` - Y - * - asof_locs - N - * - :func:`astype` - P - ``copy`` * - :func:`ceil` - Y - * - :func:`copy` - Y - * - :func:`day_name` - Y - * - :func:`delete` - Y - * - diff - N - * - :func:`difference` - Y - * - :func:`drop` - P - ``errors`` * - :func:`drop_duplicates` - Y - * - :func:`droplevel` - Y - * - :func:`dropna` - Y - * - duplicated - N - * - :func:`equals` - Y - * - :func:`factorize` - Y - * - :func:`fillna` - P - ``downcast`` * - :func:`floor` - Y - * - format - N - * - get_indexer - N - * - get_indexer_for - N - * - get_indexer_non_unique - N - * - :func:`get_level_values` - Y - * - get_loc - N - * - get_slice_bound - N - * - groupby - N - * - :func:`holds_integer` - Y - * - :func:`identical` - Y - * - :func:`indexer_at_time` - Y - * - :func:`indexer_between_time` - Y - * - infer_objects - N - * - :func:`insert` - Y - * - :func:`intersection` - P - ``sort`` * - is\_ - N - * - :func:`is_boolean` - Y - * - :func:`is_categorical` - Y - * - :func:`is_floating` - Y - * - :func:`is_integer` - Y - * - :func:`is_interval` - Y - * - :func:`is_numeric` - Y - * - :func:`is_object` - Y - * - :func:`isin` - P - ``level`` * - :func:`isna` - Y - * - :func:`isnull` - Y - * - :func:`isocalendar` - Y - * - :func:`item` - Y - * - join - N - * - :func:`map` - Y - * - :func:`max` - P - ``axis`` , ``skipna`` * - mean - N - * - memory_usage - N - * - :func:`min` - P - ``axis`` , ``skipna`` * - :func:`month_name` - Y - * - :func:`normalize` - Y - * - :func:`notna` - Y - * - :func:`notnull` - Y - * - :func:`nunique` - Y - * - putmask - N - * - ravel - N - * - reindex - N - * - :func:`rename` - Y - * - :func:`repeat` - P - ``axis`` * - :func:`round` - Y - * - searchsorted - N - * - :func:`set_names` - Y - * - :func:`shift` - P - ``freq`` * - slice_indexer - N - * - slice_locs - N - * - snap - N - * - :func:`sort` - Y - * - :func:`sort_values` - P - ``key`` , ``na_position`` * - sortlevel - N - * - std - N - * - :func:`strftime` - Y - * - :func:`symmetric_difference` - Y - * - :func:`take` - P - ``allow_fill`` , ``axis`` , ``fill_value`` * - to_flat_index - N - * - :func:`to_frame` - Y - * - to_julian_date - N - * - :func:`to_list` - Y - * - :func:`to_numpy` - P - ``na_value`` * - to_period - N - * - to_pydatetime - N - * - :func:`to_series` - P - ``index`` * - :func:`tolist` - Y - * - :func:`transpose` - Y - * - tz_convert - N - * - tz_localize - N - * - :func:`union` - Y - * - :func:`unique` - Y - * - :func:`value_counts` - Y - * - :func:`view` - Y - * - where - N - Index API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.Index .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - :func:`all` - Y - * - :func:`any` - Y - * - :func:`append` - Y - * - :func:`argmax` - P - ``axis`` , ``skipna`` * - :func:`argmin` - P - ``axis`` , ``skipna`` * - argsort - N - * - :func:`asof` - Y - * - asof_locs - N - * - :func:`astype` - P - ``copy`` * - :func:`copy` - Y - * - :func:`delete` - Y - * - diff - N - * - :func:`difference` - Y - * - :func:`drop` - P - ``errors`` * - :func:`drop_duplicates` - Y - * - :func:`droplevel` - Y - * - :func:`dropna` - Y - * - duplicated - N - * - :func:`equals` - Y - * - :func:`factorize` - Y - * - :func:`fillna` - P - ``downcast`` * - format - N - * - get_indexer - N - * - get_indexer_for - N - * - get_indexer_non_unique - N - * - :func:`get_level_values` - Y - * - get_loc - N - * - get_slice_bound - N - * - groupby - N - * - :func:`holds_integer` - Y - * - :func:`identical` - Y - * - infer_objects - N - * - :func:`insert` - Y - * - :func:`intersection` - P - ``sort`` * - is\_ - N - * - :func:`is_boolean` - Y - * - :func:`is_categorical` - Y - * - :func:`is_floating` - Y - * - :func:`is_integer` - Y - * - :func:`is_interval` - Y - * - :func:`is_numeric` - Y - * - :func:`is_object` - Y - * - :func:`isin` - P - ``level`` * - :func:`isna` - Y - * - :func:`isnull` - Y - * - :func:`item` - Y - * - join - N - * - :func:`map` - Y - * - :func:`max` - P - ``axis`` , ``skipna`` * - memory_usage - N - * - :func:`min` - P - ``axis`` , ``skipna`` * - :func:`notna` - Y - * - :func:`notnull` - Y - * - :func:`nunique` - Y - * - putmask - N - * - ravel - N - * - reindex - N - * - :func:`rename` - Y - * - :func:`repeat` - P - ``axis`` * - round - N - * - searchsorted - N - * - :func:`set_names` - Y - * - :func:`shift` - P - ``freq`` * - slice_indexer - N - * - slice_locs - N - * - :func:`sort` - Y - * - :func:`sort_values` - P - ``key`` , ``na_position`` * - sortlevel - N - * - :func:`symmetric_difference` - Y - * - :func:`take` - P - ``allow_fill`` , ``axis`` , ``fill_value`` * - to_flat_index - N - * - :func:`to_frame` - Y - * - :func:`to_list` - Y - * - :func:`to_numpy` - P - ``na_value`` * - :func:`to_series` - P - ``index`` * - :func:`tolist` - Y - * - :func:`transpose` - Y - * - :func:`union` - Y - * - :func:`unique` - Y - * - :func:`value_counts` - Y - * - :func:`view` - Y - * - where - N - MultiIndex API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.MultiIndex .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - :func:`all` - Y - * - :func:`any` - Y - * - :func:`append` - Y - * - :func:`argmax` - P - ``axis`` , ``skipna`` * - :func:`argmin` - P - ``axis`` , ``skipna`` * - argsort - N - * - :func:`asof` - Y - * - asof_locs - N - * - :func:`astype` - P - ``copy`` * - :func:`copy` - P - ``name`` , ``names`` * - :func:`delete` - Y - * - diff - N - * - :func:`difference` - Y - * - :func:`drop` - P - ``errors`` * - :func:`drop_duplicates` - Y - * - :func:`droplevel` - Y - * - :func:`dropna` - Y - * - duplicated - N - * - :func:`equal_levels` - Y - * - :func:`equals` - Y - * - :func:`factorize` - P - ``use_na_sentinel`` * - :func:`fillna` - P - ``downcast`` * - format - N - * - get_indexer - N - * - get_indexer_for - N - * - get_indexer_non_unique - N - * - :func:`get_level_values` - Y - * - get_loc - N - * - get_loc_level - N - * - get_locs - N - * - get_slice_bound - N - * - groupby - N - * - :func:`holds_integer` - Y - * - :func:`identical` - Y - * - infer_objects - N - * - :func:`insert` - Y - * - :func:`intersection` - P - ``sort`` * - is\_ - N - * - :func:`is_boolean` - Y - * - :func:`is_categorical` - Y - * - :func:`is_floating` - Y - * - :func:`is_integer` - Y - * - :func:`is_interval` - Y - * - :func:`is_numeric` - Y - * - :func:`is_object` - Y - * - :func:`isin` - P - ``level`` * - :func:`isna` - Y - * - :func:`isnull` - Y - * - :func:`item` - Y - * - join - N - * - :func:`map` - Y - * - :func:`max` - P - ``axis`` , ``skipna`` * - memory_usage - N - * - :func:`min` - P - ``axis`` , ``skipna`` * - :func:`notna` - Y - * - :func:`notnull` - Y - * - :func:`nunique` - Y - * - putmask - N - * - ravel - N - * - reindex - N - * - remove_unused_levels - N - * - :func:`rename` - P - ``level`` , ``names`` * - reorder_levels - N - * - :func:`repeat` - P - ``axis`` * - round - N - * - searchsorted - N - * - set_codes - N - * - set_levels - N - * - :func:`set_names` - Y - * - :func:`shift` - P - ``freq`` * - slice_indexer - N - * - slice_locs - N - * - :func:`sort` - Y - * - :func:`sort_values` - P - ``key`` , ``na_position`` * - sortlevel - N - * - :func:`swaplevel` - Y - * - :func:`symmetric_difference` - Y - * - :func:`take` - P - ``allow_fill`` , ``axis`` , ``fill_value`` * - to_flat_index - N - * - :func:`to_frame` - P - ``allow_duplicates`` * - :func:`to_list` - Y - * - :func:`to_numpy` - P - ``na_value`` * - :func:`to_series` - P - ``index`` * - :func:`tolist` - Y - * - :func:`transpose` - Y - * - truncate - N - * - :func:`union` - Y - * - :func:`unique` - Y - * - :func:`value_counts` - Y - * - :func:`view` - Y - * - where - N - Series API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.Series .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - :func:`abs` - Y - * - :func:`add` - P - ``axis`` , ``level`` * - :func:`add_prefix` - P - ``axis`` * - :func:`add_suffix` - P - ``axis`` * - :func:`agg` - P - ``axis`` * - :func:`aggregate` - P - ``axis`` * - :func:`align` - P - ``broadcast_axis`` , ``fill_axis`` , ``fill_value`` , ``level`` , ``limit`` and more. See the `pandas.Series.align `__ and `pyspark.pandas.Series.align `__ for detail. * - :func:`all` - P - ``bool_only`` * - :func:`any` - P - ``bool_only`` , ``skipna`` * - :func:`apply` - P - ``by_row`` , ``convert_dtype`` * - :func:`argmax` - Y - * - :func:`argmin` - Y - * - :func:`argsort` - P - ``axis`` , ``kind`` , ``order`` , ``stable`` * - asfreq - N - * - :func:`asof` - P - ``subset`` * - :func:`astype` - P - ``copy`` , ``errors`` * - :func:`at_time` - Y - * - :func:`autocorr` - Y - * - :func:`backfill` - P - ``downcast`` * - :func:`between` - Y - * - :func:`between_time` - Y - * - :func:`bfill` - P - ``downcast`` , ``limit_area`` * - :func:`bool` - Y - * - case_when - N - * - :func:`clip` - P - ``axis`` * - combine - N - * - :func:`combine_first` - Y - * - :func:`compare` - P - ``align_axis`` , ``result_names`` * - convert_dtypes - N - * - :func:`copy` - Y - * - :func:`corr` - Y - * - :func:`count` - Y - * - :func:`cov` - Y - * - :func:`cummax` - P - ``axis`` * - :func:`cummin` - P - ``axis`` * - :func:`cumprod` - P - ``axis`` * - :func:`cumsum` - P - ``axis`` * - :func:`describe` - P - ``exclude`` , ``include`` * - :func:`diff` - Y - * - :func:`div` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`divide` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`divmod` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`dot` - Y - * - :func:`drop` - P - ``axis`` , ``errors`` * - :func:`drop_duplicates` - P - ``ignore_index`` * - :func:`droplevel` - P - ``axis`` * - :func:`dropna` - P - ``how`` , ``ignore_index`` * - :func:`duplicated` - Y - * - :func:`eq` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`equals` - Y - * - :func:`ewm` - P - ``adjust`` , ``axis`` , ``method`` , ``times`` * - :func:`expanding` - P - ``axis`` , ``method`` * - :func:`explode` - P - ``ignore_index`` * - :func:`factorize` - Y - * - :func:`ffill` - P - ``downcast`` , ``limit_area`` * - :func:`fillna` - P - ``downcast`` * - :func:`filter` - Y - * - :func:`first` - Y - * - :func:`first_valid_index` - Y - * - :func:`floordiv` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`ge` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`get` - Y - * - :func:`groupby` - P - ``group_keys`` , ``level`` , ``observed`` , ``sort`` * - :func:`gt` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`head` - Y - * - :func:`hist` - P - ``ax`` , ``backend`` , ``by`` , ``figsize`` , ``grid`` and more. See the `pandas.Series.hist `__ and `pyspark.pandas.Series.hist `__ for detail. * - :func:`idxmax` - P - ``axis`` * - :func:`idxmin` - P - ``axis`` * - infer_objects - N - * - info - N - * - :func:`interpolate` - P - ``axis`` , ``downcast`` , ``inplace`` * - :func:`isin` - Y - * - :func:`isna` - Y - * - :func:`isnull` - Y - * - :func:`item` - Y - * - :func:`items` - Y - * - :func:`keys` - Y - * - :func:`kurt` - Y - * - :func:`kurtosis` - Y - * - :func:`last` - Y - * - :func:`last_valid_index` - Y - * - :func:`le` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`lt` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`map` - Y - * - :func:`mask` - P - ``axis`` , ``inplace`` , ``level`` * - :func:`max` - Y - * - :func:`mean` - Y - * - :func:`median` - Y - * - memory_usage - N - * - :func:`min` - Y - * - :func:`mod` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`mode` - Y - * - :func:`mul` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`multiply` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`ne` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`nlargest` - P - ``keep`` * - :func:`notna` - Y - * - :func:`notnull` - Y - * - :func:`nsmallest` - P - ``keep`` * - :func:`nunique` - Y - * - :func:`pad` - P - ``downcast`` * - :func:`pct_change` - P - ``fill_method`` , ``freq`` , ``limit`` * - :func:`pipe` - Y - * - :func:`pop` - Y - * - :func:`pow` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`prod` - Y - * - :func:`product` - Y - * - :func:`quantile` - P - ``interpolation`` * - :func:`radd` - P - ``axis`` , ``level`` * - :func:`rank` - P - ``axis`` , ``na_option`` , ``pct`` * - ravel - N - * - :func:`rdiv` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`rdivmod` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`reindex` - P - ``axis`` , ``copy`` , ``level`` , ``limit`` , ``method`` and more. See the `pandas.Series.reindex `__ and `pyspark.pandas.Series.reindex `__ for detail. * - :func:`reindex_like` - P - ``copy`` , ``limit`` , ``method`` , ``tolerance`` * - :func:`rename` - P - ``axis`` , ``copy`` , ``errors`` , ``inplace`` , ``level`` * - :func:`rename_axis` - P - ``axis`` , ``copy`` * - reorder_levels - N - * - :func:`repeat` - P - ``axis`` * - :func:`replace` - P - ``inplace`` , ``limit`` , ``method`` * - :func:`resample` - P - ``axis`` , ``convention`` , ``group_keys`` , ``kind`` , ``level`` and more. See the `pandas.Series.resample `__ and `pyspark.pandas.Series.resample `__ for detail. * - :func:`reset_index` - P - ``allow_duplicates`` * - :func:`rfloordiv` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`rmod` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`rmul` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`rolling` - P - ``axis`` , ``center`` , ``closed`` , ``method`` , ``on`` and more. See the `pandas.Series.rolling `__ and `pyspark.pandas.Series.rolling `__ for detail. * - :func:`round` - Y - * - :func:`rpow` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`rsub` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`rtruediv` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`sample` - P - ``axis`` , ``weights`` * - :func:`searchsorted` - P - ``sorter`` * - :func:`sem` - Y - * - set_axis - N - * - set_flags - N - * - :func:`shift` - P - ``axis`` , ``freq`` , ``suffix`` * - :func:`skew` - Y - * - :func:`sort_index` - P - ``key`` , ``sort_remaining`` * - :func:`sort_values` - P - ``axis`` , ``key`` , ``kind`` * - :func:`squeeze` - Y - * - :func:`std` - Y - * - :func:`sub` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`subtract` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`sum` - Y - * - :func:`swapaxes` - P - ``axis1`` , ``axis2`` * - :func:`swaplevel` - Y - * - :func:`tail` - Y - * - :func:`take` - P - ``axis`` * - :func:`to_clipboard` - Y - * - :func:`to_csv` - P - ``chunksize`` , ``compression`` , ``decimal`` , ``doublequote`` , ``encoding`` and more. See the `pandas.Series.to_csv `__ and `pyspark.pandas.Series.to_csv `__ for detail. * - :func:`to_dict` - Y - * - :func:`to_excel` - P - ``engine_kwargs`` , ``storage_options`` * - :func:`to_frame` - Y - * - :func:`to_hdf` - Y - * - :func:`to_json` - P - ``date_format`` , ``date_unit`` , ``default_handler`` , ``double_precision`` , ``force_ascii`` and more. See the `pandas.Series.to_json `__ and `pyspark.pandas.Series.to_json `__ for detail. * - :func:`to_latex` - P - ``caption`` , ``label`` , ``position`` * - :func:`to_list` - Y - * - :func:`to_markdown` - P - ``index`` , ``storage_options`` * - :func:`to_numpy` - P - ``copy`` , ``dtype`` , ``na_value`` * - to_period - N - * - to_pickle - N - * - to_sql - N - * - :func:`to_string` - P - ``min_rows`` * - to_timestamp - N - * - to_xarray - N - * - :func:`tolist` - Y - * - :func:`transform` - Y - * - :func:`transpose` - Y - * - :func:`truediv` - P - ``axis`` , ``fill_value`` , ``level`` * - :func:`truncate` - Y - * - tz_convert - N - * - tz_localize - N - * - :func:`unique` - Y - * - :func:`unstack` - P - ``fill_value`` , ``sort`` * - :func:`update` - Y - * - :func:`value_counts` - Y - * - :func:`var` - P - ``skipna`` * - view - N - * - :func:`where` - P - ``axis`` , ``inplace`` , ``level`` * - :func:`xs` - P - ``axis`` , ``drop_level`` TimedeltaIndex API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.TimedeltaIndex .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - :func:`all` - Y - * - :func:`any` - Y - * - :func:`append` - Y - * - :func:`argmax` - P - ``axis`` , ``skipna`` * - :func:`argmin` - P - ``axis`` , ``skipna`` * - argsort - N - * - as_unit - N - * - :func:`asof` - Y - * - asof_locs - N - * - :func:`astype` - P - ``copy`` * - ceil - N - * - :func:`copy` - Y - * - :func:`delete` - Y - * - diff - N - * - :func:`difference` - Y - * - :func:`drop` - P - ``errors`` * - :func:`drop_duplicates` - Y - * - :func:`droplevel` - Y - * - :func:`dropna` - Y - * - duplicated - N - * - :func:`equals` - Y - * - :func:`factorize` - Y - * - :func:`fillna` - P - ``downcast`` * - floor - N - * - format - N - * - get_indexer - N - * - get_indexer_for - N - * - get_indexer_non_unique - N - * - :func:`get_level_values` - Y - * - get_loc - N - * - get_slice_bound - N - * - groupby - N - * - :func:`holds_integer` - Y - * - :func:`identical` - Y - * - infer_objects - N - * - :func:`insert` - Y - * - :func:`intersection` - P - ``sort`` * - is\_ - N - * - :func:`is_boolean` - Y - * - :func:`is_categorical` - Y - * - :func:`is_floating` - Y - * - :func:`is_integer` - Y - * - :func:`is_interval` - Y - * - :func:`is_numeric` - Y - * - :func:`is_object` - Y - * - :func:`isin` - P - ``level`` * - :func:`isna` - Y - * - :func:`isnull` - Y - * - :func:`item` - Y - * - join - N - * - :func:`map` - Y - * - :func:`max` - P - ``axis`` , ``skipna`` * - mean - N - * - median - N - * - memory_usage - N - * - :func:`min` - P - ``axis`` , ``skipna`` * - :func:`notna` - Y - * - :func:`notnull` - Y - * - :func:`nunique` - Y - * - putmask - N - * - ravel - N - * - reindex - N - * - :func:`rename` - Y - * - :func:`repeat` - P - ``axis`` * - round - N - * - searchsorted - N - * - :func:`set_names` - Y - * - :func:`shift` - P - ``freq`` * - slice_indexer - N - * - slice_locs - N - * - :func:`sort` - Y - * - :func:`sort_values` - P - ``key`` , ``na_position`` * - sortlevel - N - * - std - N - * - sum - N - * - :func:`symmetric_difference` - Y - * - :func:`take` - P - ``allow_fill`` , ``axis`` , ``fill_value`` * - to_flat_index - N - * - :func:`to_frame` - Y - * - :func:`to_list` - Y - * - :func:`to_numpy` - P - ``na_value`` * - to_pytimedelta - N - * - :func:`to_series` - P - ``index`` * - :func:`tolist` - Y - * - total_seconds - N - * - :func:`transpose` - Y - * - :func:`union` - Y - * - :func:`unique` - Y - * - :func:`value_counts` - Y - * - :func:`view` - Y - * - where - N - General Function API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - array - N - * - bdate_range - N - * - :func:`concat` - P - ``copy`` , ``keys`` , ``levels`` , ``names`` , ``verify_integrity`` * - crosstab - N - * - cut - N - * - :func:`date_range` - P - ``unit`` * - eval - N - * - factorize - N - * - from_dummies - N - * - :func:`get_dummies` - Y - * - infer_freq - N - * - interval_range - N - * - :func:`isna` - Y - * - :func:`isnull` - Y - * - :func:`json_normalize` - P - ``errors`` , ``max_level`` , ``meta`` , ``meta_prefix`` , ``record_path`` and more. See the `pandas.json_normalize `__ and `pyspark.pandas.json_normalize `__ for detail. * - lreshape - N - * - :func:`melt` - P - ``col_level`` , ``ignore_index`` * - :func:`merge` - P - ``copy`` , ``indicator`` , ``left`` , ``sort`` , ``validate`` * - :func:`merge_asof` - Y - * - merge_ordered - N - * - :func:`notna` - Y - * - :func:`notnull` - Y - * - period_range - N - * - pivot - N - * - pivot_table - N - * - qcut - N - * - :func:`read_clipboard` - P - ``dtype_backend`` * - :func:`read_csv` - P - ``cache_dates`` , ``chunksize`` , ``compression`` , ``converters`` , ``date_format`` and more. See the `pandas.read_csv `__ and `pyspark.pandas.read_csv `__ for detail. * - :func:`read_excel` - P - ``date_format`` , ``decimal`` , ``dtype_backend`` , ``engine_kwargs`` , ``na_filter`` and more. See the `pandas.read_excel `__ and `pyspark.pandas.read_excel `__ for detail. * - read_feather - N - * - read_fwf - N - * - read_gbq - N - * - read_hdf - N - * - :func:`read_html` - P - ``dtype_backend`` , ``extract_links`` , ``storage_options`` * - :func:`read_json` - P - ``chunksize`` , ``compression`` , ``convert_axes`` , ``convert_dates`` , ``date_unit`` and more. See the `pandas.read_json `__ and `pyspark.pandas.read_json `__ for detail. * - :func:`read_orc` - P - ``dtype_backend`` , ``filesystem`` * - :func:`read_parquet` - P - ``dtype_backend`` , ``engine`` , ``filesystem`` , ``filters`` , ``storage_options`` and more. See the `pandas.read_parquet `__ and `pyspark.pandas.read_parquet `__ for detail. * - read_pickle - N - * - read_sas - N - * - read_spss - N - * - :func:`read_sql` - P - ``chunksize`` , ``coerce_float`` , ``dtype`` , ``dtype_backend`` , ``params`` and more. See the `pandas.read_sql `__ and `pyspark.pandas.read_sql `__ for detail. * - :func:`read_sql_query` - P - ``chunksize`` , ``coerce_float`` , ``dtype`` , ``dtype_backend`` , ``params`` and more. See the `pandas.read_sql_query `__ and `pyspark.pandas.read_sql_query `__ for detail. * - :func:`read_sql_table` - P - ``chunksize`` , ``coerce_float`` , ``dtype_backend`` , ``parse_dates`` * - read_stata - N - * - :func:`read_table` - P - ``cache_dates`` , ``chunksize`` , ``comment`` , ``compression`` , ``converters`` and more. See the `pandas.read_table `__ and `pyspark.pandas.read_table `__ for detail. * - read_xml - N - * - set_eng_float_format - N - * - show_versions - N - * - test - N - * - :func:`timedelta_range` - P - ``unit`` * - :func:`to_datetime` - P - ``cache`` , ``dayfirst`` , ``exact`` , ``utc`` , ``yearfirst`` * - :func:`to_numeric` - P - ``downcast`` , ``dtype_backend`` * - to_pickle - N - * - :func:`to_timedelta` - Y - * - unique - N - * - value_counts - N - * - wide_to_long - N - Expanding API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.window.Expanding .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - agg - N - * - aggregate - N - * - apply - N - * - corr - N - * - :func:`count` - P - ``numeric_only`` * - cov - N - * - :func:`kurt` - P - ``numeric_only`` * - :func:`max` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`mean` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - median - N - * - :func:`min` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`quantile` - P - ``interpolation`` , ``numeric_only`` , ``q`` * - rank - N - * - sem - N - * - :func:`skew` - P - ``numeric_only`` * - :func:`std` - P - ``ddof`` , ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`sum` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`var` - P - ``ddof`` , ``engine`` , ``engine_kwargs`` , ``numeric_only`` ExpandingGroupby API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.window.ExpandingGroupby .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - agg - N - * - aggregate - N - * - apply - N - * - corr - N - * - :func:`count` - P - ``numeric_only`` * - cov - N - * - :func:`kurt` - P - ``numeric_only`` * - :func:`max` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`mean` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - median - N - * - :func:`min` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`quantile` - P - ``interpolation`` , ``numeric_only`` , ``q`` * - rank - N - * - sem - N - * - :func:`skew` - P - ``numeric_only`` * - :func:`std` - P - ``ddof`` , ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`sum` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`var` - P - ``ddof`` , ``engine`` , ``engine_kwargs`` , ``numeric_only`` Rolling API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.window.Rolling .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - agg - N - * - aggregate - N - * - apply - N - * - corr - N - * - :func:`count` - P - ``numeric_only`` * - cov - N - * - :func:`kurt` - P - ``numeric_only`` * - :func:`max` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`mean` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - median - N - * - :func:`min` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`quantile` - P - ``interpolation`` , ``numeric_only`` , ``q`` * - rank - N - * - sem - N - * - :func:`skew` - P - ``numeric_only`` * - :func:`std` - P - ``ddof`` , ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`sum` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`var` - P - ``ddof`` , ``engine`` , ``engine_kwargs`` , ``numeric_only`` RollingGroupby API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.window.RollingGroupby .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - agg - N - * - aggregate - N - * - apply - N - * - corr - N - * - :func:`count` - P - ``numeric_only`` * - cov - N - * - :func:`kurt` - P - ``numeric_only`` * - :func:`max` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`mean` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - median - N - * - :func:`min` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`quantile` - P - ``interpolation`` , ``numeric_only`` , ``q`` * - rank - N - * - sem - N - * - :func:`skew` - P - ``numeric_only`` * - :func:`std` - P - ``ddof`` , ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`sum` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`var` - P - ``ddof`` , ``engine`` , ``engine_kwargs`` , ``numeric_only`` Window API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.window.Window .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - agg - N - * - aggregate - N - * - mean - N - * - std - N - * - sum - N - * - var - N - DataFrameGroupBy API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.groupby.DataFrameGroupBy .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - :func:`agg` - P - ``engine`` , ``engine_kwargs`` , ``func`` * - :func:`aggregate` - P - ``engine`` , ``engine_kwargs`` , ``func`` * - :func:`all` - Y - * - :func:`any` - P - ``skipna`` * - :func:`apply` - P - ``include_groups`` * - :func:`bfill` - Y - * - boxplot - N - * - :func:`corr` - Y - * - corrwith - N - * - :func:`count` - Y - * - cov - N - * - :func:`cumcount` - Y - * - :func:`cummax` - P - ``axis`` , ``numeric_only`` * - :func:`cummin` - P - ``axis`` , ``numeric_only`` * - :func:`cumprod` - P - ``axis`` * - :func:`cumsum` - P - ``axis`` * - :func:`describe` - P - ``exclude`` , ``include`` , ``percentiles`` * - :func:`diff` - P - ``axis`` * - :func:`ewm` - Y - * - :func:`expanding` - Y - * - :func:`ffill` - Y - * - :func:`fillna` - P - ``downcast`` * - :func:`filter` - P - ``dropna`` * - :func:`first` - P - ``skipna`` * - :func:`get_group` - P - ``obj`` * - :func:`head` - Y - * - hist - N - * - :func:`idxmax` - P - ``axis`` , ``numeric_only`` * - :func:`idxmin` - P - ``axis`` , ``numeric_only`` * - :func:`last` - P - ``skipna`` * - :func:`max` - P - ``engine`` , ``engine_kwargs`` * - :func:`mean` - P - ``engine`` , ``engine_kwargs`` * - :func:`median` - Y - * - :func:`min` - P - ``engine`` , ``engine_kwargs`` * - ngroup - N - * - :func:`nunique` - Y - * - ohlc - N - * - pct_change - N - * - pipe - N - * - :func:`prod` - Y - * - :func:`quantile` - P - ``interpolation`` , ``numeric_only`` * - :func:`rank` - P - ``axis`` , ``na_option`` , ``pct`` * - resample - N - * - :func:`rolling` - Y - * - sample - N - * - :func:`sem` - P - ``numeric_only`` * - :func:`shift` - P - ``axis`` , ``freq`` , ``suffix`` * - :func:`size` - Y - * - :func:`skew` - P - ``axis`` , ``numeric_only`` , ``skipna`` * - :func:`std` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`sum` - P - ``engine`` , ``engine_kwargs`` * - :func:`tail` - Y - * - take - N - * - :func:`transform` - P - ``engine`` , ``engine_kwargs`` * - value_counts - N - * - :func:`var` - P - ``engine`` , ``engine_kwargs`` GroupBy API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.groupby.GroupBy .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - :func:`agg` - P - ``func`` * - :func:`aggregate` - P - ``func`` * - :func:`all` - Y - * - :func:`any` - P - ``skipna`` * - :func:`apply` - P - ``include_groups`` * - :func:`bfill` - Y - * - :func:`count` - Y - * - :func:`cumcount` - Y - * - :func:`cummax` - P - ``axis`` , ``numeric_only`` * - :func:`cummin` - P - ``axis`` , ``numeric_only`` * - :func:`cumprod` - P - ``axis`` * - :func:`cumsum` - P - ``axis`` * - describe - N - * - :func:`diff` - P - ``axis`` * - :func:`ewm` - Y - * - :func:`expanding` - Y - * - :func:`ffill` - Y - * - :func:`first` - P - ``skipna`` * - :func:`get_group` - P - ``obj`` * - :func:`head` - Y - * - :func:`last` - P - ``skipna`` * - :func:`max` - P - ``engine`` , ``engine_kwargs`` * - :func:`mean` - P - ``engine`` , ``engine_kwargs`` * - :func:`median` - Y - * - :func:`min` - P - ``engine`` , ``engine_kwargs`` * - ngroup - N - * - ohlc - N - * - pct_change - N - * - pipe - N - * - :func:`prod` - Y - * - :func:`quantile` - P - ``interpolation`` , ``numeric_only`` * - :func:`rank` - P - ``axis`` , ``na_option`` , ``pct`` * - resample - N - * - :func:`rolling` - Y - * - sample - N - * - :func:`sem` - P - ``numeric_only`` * - :func:`shift` - P - ``axis`` , ``freq`` , ``suffix`` * - :func:`size` - Y - * - :func:`std` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`sum` - P - ``engine`` , ``engine_kwargs`` * - :func:`tail` - Y - * - :func:`var` - P - ``engine`` , ``engine_kwargs`` SeriesGroupBy API ---------------------------------------------------------------------------------------------------- .. currentmodule:: pyspark.pandas.groupby.SeriesGroupBy .. list-table:: :header-rows: 1 * - API - Implemented - Missing parameters * - :func:`agg` - P - ``engine`` , ``engine_kwargs`` , ``func`` * - :func:`aggregate` - P - ``engine`` , ``engine_kwargs`` , ``func`` * - :func:`all` - Y - * - :func:`any` - P - ``skipna`` * - :func:`apply` - Y - * - :func:`bfill` - Y - * - corr - N - * - :func:`count` - Y - * - cov - N - * - :func:`cumcount` - Y - * - :func:`cummax` - P - ``axis`` , ``numeric_only`` * - :func:`cummin` - P - ``axis`` , ``numeric_only`` * - :func:`cumprod` - P - ``axis`` * - :func:`cumsum` - P - ``axis`` * - describe - N - * - :func:`diff` - P - ``axis`` * - :func:`ewm` - Y - * - :func:`expanding` - Y - * - :func:`ffill` - Y - * - :func:`fillna` - P - ``downcast`` * - :func:`filter` - P - ``dropna`` * - :func:`first` - P - ``skipna`` * - :func:`get_group` - P - ``obj`` * - :func:`head` - Y - * - hist - N - * - :func:`idxmax` - P - ``axis`` * - :func:`idxmin` - P - ``axis`` * - :func:`last` - P - ``skipna`` * - :func:`max` - P - ``engine`` , ``engine_kwargs`` * - :func:`mean` - P - ``engine`` , ``engine_kwargs`` * - :func:`median` - Y - * - :func:`min` - P - ``engine`` , ``engine_kwargs`` * - ngroup - N - * - :func:`nlargest` - P - ``keep`` * - :func:`nsmallest` - P - ``keep`` * - :func:`nunique` - Y - * - ohlc - N - * - pct_change - N - * - pipe - N - * - :func:`prod` - Y - * - :func:`quantile` - P - ``interpolation`` , ``numeric_only`` * - :func:`rank` - P - ``axis`` , ``na_option`` , ``pct`` * - resample - N - * - :func:`rolling` - Y - * - sample - N - * - :func:`sem` - P - ``numeric_only`` * - :func:`shift` - P - ``axis`` , ``freq`` , ``suffix`` * - :func:`size` - Y - * - :func:`skew` - P - ``axis`` , ``numeric_only`` , ``skipna`` * - :func:`std` - P - ``engine`` , ``engine_kwargs`` , ``numeric_only`` * - :func:`sum` - P - ``engine`` , ``engine_kwargs`` * - :func:`tail` - Y - * - take - N - * - :func:`transform` - P - ``engine`` , ``engine_kwargs`` * - :func:`unique` - Y - * - :func:`value_counts` - P - ``bins`` , ``normalize`` * - :func:`var` - P - ``engine`` , ``engine_kwargs``