pyspark.sql.plot.core.PySparkPlotAccessor.hist#

PySparkPlotAccessor.hist(column=None, bins=10, **kwargs)[source]#

Draw one histogram of the DataFrame’s columns.

A histogram is a representation of the distribution of data.

Parameters

column: str or list of str, optional: Column name or list of names to be used for creating the hostogram plot. If None (default), all numeric columns will be used.
binsinteger, default 10: Number of histogram bins to be used.
**kwargs: Additional keyword arguments.

Returns

plotly.graph_objs.Figure

Examples

>>> data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)]
>>> columns = ["length", "width", "species"]
>>> df = spark.createDataFrame(data, columns)
>>> df.plot.hist(bins=4)  
>>> df.plot.hist(column=["length", "width"])  
>>> df.plot.hist(column="length", bins=4)