pyspark.sql.plot.core.PySparkPlotAccessor.hist#
- PySparkPlotAccessor.hist(column=None, bins=10, **kwargs)[source]#
Draw one histogram of the DataFrame’s columns.
A histogram is a representation of the distribution of data.
- Parameters
- column: str or list of str, optional
Column name or list of names to be used for creating the histogram plot. If None (default), all numeric columns will be used. If no numeric columns exist, behavior may depend on the plot backend.
- binsinteger, default 10
Number of histogram bins to be used.
- **kwargs
Additional keyword arguments.
- Returns
plotly.graph_objs.Figure
Examples
>>> from pyspark.sql import SparkSession >>> spark = SparkSession.builder.getOrCreate() >>> data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)] >>> columns = ["length", "width", "species"] >>> df = spark.createDataFrame(data, columns) >>> df.plot.hist(bins=4)