Package org.apache.hadoop.fs.statistics


@Public @Unstable package org.apache.hadoop.fs.statistics
This package contains support for statistic collection and reporting. This is the public API; implementation classes are to be kept elsewhere.

This package defines two interfaces:

IOStatisticsSource: a source of statistic data, which can be retrieved through a call to IOStatisticsSource.getIOStatistics() .

IOStatistics the statistics retrieved from a statistics source.

The retrieved statistics may be an immutable snapshot -in which case to get updated statistics another call to IOStatisticsSource.getIOStatistics() must be made. Or they may be dynamic -in which case every time a specific statistic is retrieved, the latest version is returned. Callers should assume that if a statistics instance is dynamic, there is no atomicity when querying multiple statistics. If the statistics source was a closeable object (e.g. a stream), the statistics MUST remain valid after the stream is closed.

Use pattern:

An application probes an object (filesystem, stream etc) to see if it implements IOStatisticsSource, and, if it is, calls getIOStatistics() to get its statistics. If this is non-null, the client has statistics on the current state of the statistics.

The expectation is that a statistics source is dynamic: when a value is looked up the most recent values are returned. When iterating through the set, the values of the iterator SHOULD be frozen at the time the iterator was requested.

These statistics can be used to: log operations, profile applications, and make assertions about the state of the output.

The names of statistics are a matter of choice of the specific source. However, StoreStatisticNames contains a set of names recommended for object store operations. StreamStatisticNames declares recommended names for statistics provided for input and output streams.

Utility classes:

  • IOStatisticsSupport. General support, including the ability to take a serializable snapshot of the current state of an IOStatistics instance.
  • IOStatisticsLogging. Methods for robust/on-demand string conversion, designed for use in logging statements and toString() implementations.
  • IOStatisticsSnapshot. A static snaphot of statistics which can be marshalled via java serialization or as JSON via jackson. It supports aggregation, so can be used to generate aggregate statistics.

Implementors notes:

  1. IOStatistics keys SHOULD be standard names where possible.
  2. An IOStatistics instance MUST be unique to that specific instance of IOStatisticsSource. (i.e. not shared the way StorageStatistics are)
  3. MUST return the same values irrespective of which thread the statistics are retrieved or its keys evaluated.
  4. MUST NOT remove keys once a statistic instance has been created.
  5. MUST NOT add keys once a statistic instance has been created.
  6. MUST NOT block for long periods of time while blocking operations (reads, writes) are taking place in the source. That is: minimal synchronization points (AtomicLongs etc.) may be used to share values, but retrieval of statistics should be fast and return values even while slow/blocking remote IO is underway.
  7. MUST support value enumeration and retrieval after the source has been closed.
  8. SHOULD NOT have back-references to potentially expensive objects (filesystem instances etc.)
  9. SHOULD provide statistics which can be added to generate aggregate statistics.
  • Class
    Description
    org.apache.hadoop.fs.statistics.BufferedIOStatisticsInputStream
    An extension of BufferedInputStream which implements IOStatisticsSource and forwards requests for the IOStatistics to the wrapped stream.
    org.apache.hadoop.fs.statistics.BufferedIOStatisticsOutputStream
    An extension of BufferedOutputStream which implements IOStatisticsSource and forwards requests for the IOStatistics to the wrapped stream.
    Summary of duration tracking statistics as extracted from an IOStatistics instance.
    org.apache.hadoop.fs.statistics.DurationTracker
    Interface to be implemented by objects which can track duration.
    org.apache.hadoop.fs.statistics.DurationTrackerFactory
    Interface for a source of duration tracking.
    Common statistic names for Filesystem-level statistics, including internals.
    IO Statistics.
    Interface exported by classes which support aggregation of IOStatistics.
    org.apache.hadoop.fs.statistics.IOStatisticsContext
    An interface defined to capture thread-level IOStatistics by using per thread context.
    Utility operations convert IO Statistics sources/instances to strings, especially for robustly logging.
    Setter for IOStatistics entries.
    Snapshot of statistics from a different source.
    org.apache.hadoop.fs.statistics.IOStatisticsSource
    A source of IO statistics.
    Support for working with IOStatistics.
    A mean statistic represented as the sum and the sample count; the mean is calculated on demand.
    Common statistic names for object store operations..
    These are common statistic names.