pyspark.sql.plot.core.PySparkPlotAccessor.hist#

PySparkPlotAccessor.hist(column=None, bins=10, **kwargs)[source]#

Draw one histogram of the DataFrame’s columns.

A histogram is a representation of the distribution of data.

Parameters
column: str or list of str, optional

Column name or list of names to be used for creating the histogram plot. If None (default), all numeric columns will be used. If no numeric columns exist, behavior may depend on the plot backend.

binsinteger, default 10

Number of histogram bins to be used.

**kwargs

Additional keyword arguments.

Returns
plotly.graph_objs.Figure

Examples

>>> from pyspark.sql import SparkSession
>>> spark = SparkSession.builder.getOrCreate()
>>> data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)]
>>> columns = ["length", "width", "species"]
>>> df = spark.createDataFrame(data, columns)
>>> df.plot.hist(bins=4)