Class RollingFileSystemSink

java.lang.Object
org.apache.hadoop.metrics2.sink.RollingFileSystemSink
All Implemented Interfaces:
Closeable, AutoCloseable, MetricsPlugin, MetricsSink

@Public @Evolving public class RollingFileSystemSink extends Object implements MetricsSink, Closeable

This class is a metrics sink that uses FileSystem to write the metrics logs. Every roll interval a new directory will be created under the path specified by the basepath property. All metrics will be logged to a file in the current interval's directory in a file named <hostname>.log, where <hostname> is the name of the host on which the metrics logging process is running. The base path is set by the <prefix>.sink.<instance>.basepath property. The time zone used to create the current interval's directory name is GMT. If the basepath property isn't specified, it will default to "/tmp", which is the temp directory on whatever default file system is configured for the cluster.

The <prefix>.sink.<instance>.ignore-error property controls whether an exception is thrown when an error is encountered writing a log file. The default value is true. When set to false, file errors are quietly swallowed.

The roll-interval property sets the amount of time before rolling the directory. The default value is 1 hour. The roll interval may not be less than 1 minute. The property's value should be given as number unit, where number is an integer value, and unit is a valid unit. Valid units are minute, hour, and day. The units are case insensitive and may be abbreviated or plural. If no units are specified, hours are assumed. For example, "2", "2h", "2 hour", and "2 hours" are all valid ways to specify two hours.

The roll-offset-interval-millis property sets the upper bound on a random time interval (in milliseconds) that is used to delay before the initial roll. All subsequent rolls will happen an integer number of roll intervals after the initial roll, hence retaining the original offset. The purpose of this property is to insert some variance in the roll times so that large clusters using this sink on every node don't cause a performance impact on HDFS by rolling simultaneously. The default value is 30000 (30s). When writing to HDFS, as a rule of thumb, the roll offset in millis should be no less than the number of sink instances times 5.

The primary use of this class is for logging to HDFS. As it uses FileSystem to access the target file system, however, it can be used to write to the local file system, Amazon S3, or any other supported file system. The base path for the sink will determine the file system used. An unqualified path will write to the default file system set by the configuration.

Not all file systems support the ability to append to files. In file systems without the ability to append to files, only one writer can write to a file at a time. To allow for concurrent writes from multiple daemons on a single host, the source property is used to set unique headers for the log files. The property should be set to the name of the source daemon, e.g. namenode. The value of the source property should typically be the same as the property's prefix. If this property is not set, the source is taken to be unknown.

Instead of appending to an existing file, by default the sink will create a new file with a suffix of ".<n>", where n is the next lowest integer that isn't already used in a file name, similar to the Hadoop daemon logs. NOTE: the file with the highest sequence number is the newest file, unlike the Hadoop daemon logs.

For file systems that allow append, the sink supports appending to the existing file instead. If the allow-append property is set to true, the sink will instead append to the existing file on file systems that support appends. By default, the allow-append property is false.

Note that when writing to HDFS with allow-append set to true, there is a minimum acceptable number of data nodes. If the number of data nodes drops below that minimum, the append will succeed, but reading the data will fail with an IOException in the DataStreamer class. The minimum number of data nodes required for a successful append is generally 2 or 3.

Note also that when writing to HDFS, the file size information is not updated until the file is closed (at the end of the interval) even though the data is being written successfully. This is a known HDFS limitation that exists because of the performance cost of updating the metadata. See HDFS-5478.

When using this sink in a secure (Kerberos) environment, two additional properties must be set: keytab-key and principal-key. keytab-key should contain the key by which the keytab file can be found in the configuration, for example, yarn.nodemanager.keytab. principal-key should contain the key by which the principal can be found in the configuration, for example, yarn.nodemanager.principal.

  • Field Details

    • source

      @VisibleForTesting protected String source
    • ignoreError

      @VisibleForTesting protected boolean ignoreError
    • allowAppend

      @VisibleForTesting protected boolean allowAppend
    • basePath

      @VisibleForTesting protected Path basePath
    • rollIntervalMillis

      @VisibleForTesting protected long rollIntervalMillis
    • rollOffsetIntervalMillis

      @VisibleForTesting protected long rollOffsetIntervalMillis
    • nextFlush

      @VisibleForTesting protected Calendar nextFlush
    • forceFlush

      @VisibleForTesting protected static boolean forceFlush
    • hasFlushed

      @VisibleForTesting protected static volatile boolean hasFlushed
    • suppliedConf

      @VisibleForTesting protected static Configuration suppliedConf
    • suppliedFilesystem

      @VisibleForTesting protected static FileSystem suppliedFilesystem
  • Constructor Details

    • RollingFileSystemSink

      public RollingFileSystemSink()
      Create an empty instance. Required for reflection.
    • RollingFileSystemSink

      @VisibleForTesting protected RollingFileSystemSink(long flushIntervalMillis, long flushOffsetIntervalMillis)
      Create an instance for testing.
      Parameters:
      flushIntervalMillis - the roll interval in millis
      flushOffsetIntervalMillis - the roll offset interval in millis
  • Method Details

    • init

      public void init(org.apache.commons.configuration2.SubsetConfiguration metrics2Properties)
      Description copied from interface: MetricsPlugin
      Initialize the plugin
      Specified by:
      init in interface MetricsPlugin
      Parameters:
      metrics2Properties - the configuration object for the plugin
    • getRollInterval

      @VisibleForTesting protected long getRollInterval()
      Extract the roll interval from the configuration and return it in milliseconds.
      Returns:
      the roll interval in millis
    • updateFlushTime

      @VisibleForTesting protected void updateFlushTime(Date now)
      Update the nextFlush variable to the next flush time. Add an integer number of flush intervals, preserving the initial random offset.
      Parameters:
      now - the current time
    • setInitialFlushTime

      @VisibleForTesting protected void setInitialFlushTime(Date now)
      Set the nextFlush variable to the initial flush time. The initial flush will be an integer number of flush intervals past the beginning of the current hour and will have a random offset added, up to rollOffsetIntervalMillis. The initial flush will be a time in past that can be used from which to calculate future flush times.
      Parameters:
      now - the current time
    • putMetrics

      public void putMetrics(MetricsRecord record)
      Description copied from interface: MetricsSink
      Put a metrics record in the sink
      Specified by:
      putMetrics in interface MetricsSink
      Parameters:
      record - the record to put
    • flush

      public void flush()
      Description copied from interface: MetricsSink
      Flush any buffered metrics
      Specified by:
      flush in interface MetricsSink
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable