pyspark.pandas.DataFrame.corrwith#

DataFrame.corrwith(other, axis=0, drop=False, method='pearson')[source]#

Compute pairwise correlation.

Pairwise correlation is computed between rows or columns of DataFrame with rows or columns of Series or DataFrame. DataFrames are first aligned along both axes before computing the correlations.

Added in version 3.4.0.

Parameters:

otherDataFrame, Series

Object with which to compute correlations.

axisint, default 0 or ‘index’

Can only be set to 0 now.

dropbool, default False

Drop missing indices from result.

method{‘pearson’, ‘spearman’, ‘kendall’}

pearson : standard correlation coefficient
spearman : Spearman rank correlation
kendall : Kendall Tau correlation coefficient

Returns:

Series: Pairwise correlations.

See also

DataFrame.corr: Compute pairwise correlation of columns.

Examples

>>> df1 = ps.DataFrame({
...         "A":[1, 5, 7, 8],
...         "X":[5, 8, 4, 3],
...         "C":[10, 4, 9, 3]})
>>> df1.corrwith(df1[["X", "C"]]).sort_index()
A    NaN
C    1.0
X    1.0
dtype: float64

>>> df2 = ps.DataFrame({
...         "A":[5, 3, 6, 4],
...         "B":[11, 2, 4, 3],
...         "C":[4, 3, 8, 5]})

>>> with ps.option_context("compute.ops_on_diff_frames", True):
...     df1.corrwith(df2).sort_index()
A   -0.041703
B         NaN
C    0.395437
X         NaN
dtype: float64

>>> with ps.option_context("compute.ops_on_diff_frames", True):
...     df1.corrwith(df2, method="kendall").sort_index()
A    0.0
B    NaN
C    0.0
X    NaN
dtype: float64

>>> with ps.option_context("compute.ops_on_diff_frames", True):
...     df1.corrwith(df2.B, method="spearman").sort_index()
A   -0.4
C    0.8
X   -0.2
dtype: float64

>>> with ps.option_context("compute.ops_on_diff_frames", True):
...     df2.corrwith(df1.X).sort_index()
A   -0.597614
B   -0.151186
C   -0.642857
dtype: float64