pyspark.pandas.Series.autocorr#

Series.autocorr(lag=1)[source]#

Compute the lag-N autocorrelation.

This method computes the Pearson correlation between the Series and its shifted self.

Note

the current implementation of rank uses Spark’s Window without specifying partition specification. This leads to moveing all data into a single partition in a single machine and could cause serious performance degradation. Avoid this method with very large datasets.

New in version 3.4.0.

Parameters
lagint, default 1

Number of lags to apply before performing autocorrelation.

Returns
float

The Pearson correlation between self and self.shift(lag).

See also

Series.corr

Compute the correlation between two Series.

Series.shift

Shift index by desired number of periods.

DataFrame.corr

Compute pairwise correlation of columns.

Notes

If the Pearson correlation is not well defined return ‘NaN’.

Examples

>>> s = ps.Series([.2, .0, .6, .2, np.nan, .5, .6])
>>> s.autocorr()  
-0.141219...
>>> s.autocorr(0)  
1.0...
>>> s.autocorr(2)  
0.970725...
>>> s.autocorr(-3)  
0.277350...
>>> s.autocorr(5)  
-1.000000...
>>> s.autocorr(6)  
nan

If the Pearson correlation is not well defined, then ‘NaN’ is returned.

>>> s = ps.Series([1, 0, 0, 0])
>>> s.autocorr()
nan