pyspark.sql.plot.core.PySparkPlotAccessor.hist#

PySparkPlotAccessor.hist(column=None, bins=10, **kwargs)[source]#

Draw one histogram of the DataFrame’s columns.

A histogram is a representation of the distribution of data.

Parameters
column: str or list of str, optional

Column name or list of names to be used for creating the hostogram plot. If None (default), all numeric columns will be used.

binsinteger, default 10

Number of histogram bins to be used.

**kwargs

Additional keyword arguments.

Returns
plotly.graph_objs.Figure

Examples

>>> data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)]
>>> columns = ["length", "width", "species"]
>>> df = spark.createDataFrame(data, columns)
>>> df.plot.hist(bins=4)  
>>> df.plot.hist(column=["length", "width"])  
>>> df.plot.hist(column="length", bins=4)