Class RemoteIterators

java.lang.Object
org.apache.hadoop.util.functional.RemoteIterators

@Public @Unstable public final class RemoteIterators extends Object
A set of remote iterators supporting transformation and filtering, with IOStatisticsSource passthrough, and of conversions of the iterators to lists/arrays and of performing actions on the values.

This aims to make it straightforward to use lambda-expressions to transform the results of an iterator, without losing the statistics in the process, and to chain the operations together.

The closeable operation will be passed through RemoteIterators which wrap other RemoteIterators. This is to support any iterator which can be closed to release held connections, file handles etc. Unless client code is written to assume that RemoteIterator instances may be closed, this is not likely to be broadly used. It is added to make it possible to adopt this feature in a managed way.

One notable feature is that the foreach(RemoteIterator, ConsumerRaisingIOE) method will LOG at debug any IOStatistics provided by the iterator, if such statistics are provided. There's no attempt at retrieval and logging if the LOG is not set to debug, so it is a zero cost feature unless the logger org.apache.hadoop.fs.functional.RemoteIterators is at DEBUG.

Based on the S3A Listing code, and some some work on moving other code to using iterative listings so as to pick up the statistics.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static class 
    org.apache.hadoop.util.functional.RemoteIterators.WrappingRemoteIterator<S,T>
    Wrapper of another remote iterator; IOStatistics and Closeable methods are passed down if implemented.
  • Method Summary

    Modifier and Type
    Method
    Description
    static <T> void
    cleanupRemoteIterator(org.apache.hadoop.fs.RemoteIterator<T> source)
    Clean up after an iteration.
    static <S> org.apache.hadoop.fs.RemoteIterator<S>
    closingRemoteIterator(org.apache.hadoop.fs.RemoteIterator<S> iterator, Closeable toClose)
    This adds an extra close operation alongside the passthrough to any Closeable.close() method supported by the source iterator.
    static <S> org.apache.hadoop.fs.RemoteIterator<S>
    filteringRemoteIterator(org.apache.hadoop.fs.RemoteIterator<S> iterator, org.apache.hadoop.util.functional.FunctionRaisingIOE<? super S,Boolean> filter)
    Create a RemoteIterator from a RemoteIterator and a filter function which returns true for every element to be passed through.
    static <T> long
    foreach(org.apache.hadoop.fs.RemoteIterator<T> source, org.apache.hadoop.util.functional.ConsumerRaisingIOE<? super T> consumer)
    Apply an operation to all values of a RemoteIterator.
    static <S> org.apache.hadoop.fs.RemoteIterator<S>
    haltableRemoteIterator(org.apache.hadoop.fs.RemoteIterator<S> iterator, org.apache.hadoop.util.functional.CallableRaisingIOE<Boolean> continueWork)
    Wrap an iterator with one which adds a continuation probe.
    static <S, T> org.apache.hadoop.fs.RemoteIterator<T>
    mappingRemoteIterator(org.apache.hadoop.fs.RemoteIterator<S> iterator, org.apache.hadoop.util.functional.FunctionRaisingIOE<? super S,T> mapper)
    Create an iterator from an iterator and a transformation function.
    static org.apache.hadoop.fs.RemoteIterator<Long>
    rangeExcludingIterator(long start, long excludedFinish)
    A remote iterator which simply counts up, stopping once the value is greater than the value of excludedFinish.
    static <T> org.apache.hadoop.fs.RemoteIterator<T>
    Create a remote iterator from an array.
    static <T> org.apache.hadoop.fs.RemoteIterator<T>
    Create a remote iterator from a java.util.Iterable -e.g. a list or other collection.
    static <T> org.apache.hadoop.fs.RemoteIterator<T>
    Create a remote iterator from a java.util.Iterator.
    static <T> org.apache.hadoop.fs.RemoteIterator<T>
    Create an iterator from a singleton.
    static <T> T[]
    toArray(org.apache.hadoop.fs.RemoteIterator<T> source, T[] a)
    Build an array from a RemoteIterator.
    static <T> List<T>
    toList(org.apache.hadoop.fs.RemoteIterator<T> source)
    Build a list from a RemoteIterator.
    static <S, T> org.apache.hadoop.fs.RemoteIterator<T>
    typeCastingRemoteIterator(org.apache.hadoop.fs.RemoteIterator<S> iterator)
    Create a RemoteIterator from a RemoteIterator, casting the type in the process.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • remoteIteratorFromSingleton

      public static <T> org.apache.hadoop.fs.RemoteIterator<T> remoteIteratorFromSingleton(@Nullable T singleton)
      Create an iterator from a singleton.
      Type Parameters:
      T - type
      Parameters:
      singleton - instance
      Returns:
      a remote iterator
    • remoteIteratorFromIterator

      public static <T> org.apache.hadoop.fs.RemoteIterator<T> remoteIteratorFromIterator(Iterator<T> iterator)
      Create a remote iterator from a java.util.Iterator.
      Type Parameters:
      T - type
      Parameters:
      iterator - iterator.
      Returns:
      a remote iterator
    • remoteIteratorFromIterable

      public static <T> org.apache.hadoop.fs.RemoteIterator<T> remoteIteratorFromIterable(Iterable<T> iterable)
      Create a remote iterator from a java.util.Iterable -e.g. a list or other collection.
      Type Parameters:
      T - type
      Parameters:
      iterable - iterable.
      Returns:
      a remote iterator
    • remoteIteratorFromArray

      public static <T> org.apache.hadoop.fs.RemoteIterator<T> remoteIteratorFromArray(T[] array)
      Create a remote iterator from an array.
      Type Parameters:
      T - type
      Parameters:
      array - array.
      Returns:
      a remote iterator
    • mappingRemoteIterator

      public static <S, T> org.apache.hadoop.fs.RemoteIterator<T> mappingRemoteIterator(org.apache.hadoop.fs.RemoteIterator<S> iterator, org.apache.hadoop.util.functional.FunctionRaisingIOE<? super S,T> mapper)
      Create an iterator from an iterator and a transformation function.
      Type Parameters:
      S - source type
      T - result type
      Parameters:
      iterator - source
      mapper - transformation
      Returns:
      a remote iterator
    • typeCastingRemoteIterator

      public static <S, T> org.apache.hadoop.fs.RemoteIterator<T> typeCastingRemoteIterator(org.apache.hadoop.fs.RemoteIterator<S> iterator)
      Create a RemoteIterator from a RemoteIterator, casting the type in the process. This is to help with filesystem API calls where overloading causes confusion (e.g. listStatusIterator())
      Type Parameters:
      S - source type
      T - result type
      Parameters:
      iterator - source
      Returns:
      a remote iterator
    • filteringRemoteIterator

      public static <S> org.apache.hadoop.fs.RemoteIterator<S> filteringRemoteIterator(org.apache.hadoop.fs.RemoteIterator<S> iterator, org.apache.hadoop.util.functional.FunctionRaisingIOE<? super S,Boolean> filter)
      Create a RemoteIterator from a RemoteIterator and a filter function which returns true for every element to be passed through.

      Elements are filtered in the hasNext() method; if not used the filtering will be done on demand in the next() call.

      Type Parameters:
      S - type
      Parameters:
      iterator - source
      filter - filter
      Returns:
      a remote iterator
    • closingRemoteIterator

      public static <S> org.apache.hadoop.fs.RemoteIterator<S> closingRemoteIterator(org.apache.hadoop.fs.RemoteIterator<S> iterator, Closeable toClose)
      This adds an extra close operation alongside the passthrough to any Closeable.close() method supported by the source iterator.
      Type Parameters:
      S - source type.
      Parameters:
      iterator - source
      toClose - extra object to close.
      Returns:
      a new iterator
    • haltableRemoteIterator

      public static <S> org.apache.hadoop.fs.RemoteIterator<S> haltableRemoteIterator(org.apache.hadoop.fs.RemoteIterator<S> iterator, org.apache.hadoop.util.functional.CallableRaisingIOE<Boolean> continueWork)
      Wrap an iterator with one which adds a continuation probe. This allows work to exit fast without complicated breakout logic
      Type Parameters:
      S - source type.
      Parameters:
      iterator - source
      continueWork - predicate which will trigger a fast halt if it returns false.
      Returns:
      a new iterator
    • rangeExcludingIterator

      public static org.apache.hadoop.fs.RemoteIterator<Long> rangeExcludingIterator(long start, long excludedFinish)
      A remote iterator which simply counts up, stopping once the value is greater than the value of excludedFinish. This is primarily for tests or when submitting work into a TaskPool. equivalent to
         for(long l = start, l < excludedFinish; l++) yield l;
       
      Parameters:
      start - start value
      excludedFinish - excluded finish
      Returns:
      an iterator which returns longs from [start, finish)
    • toList

      public static <T> List<T> toList(org.apache.hadoop.fs.RemoteIterator<T> source) throws IOException
      Build a list from a RemoteIterator.
      Type Parameters:
      T - type
      Parameters:
      source - source iterator
      Returns:
      a list of the values.
      Throws:
      IOException - if the source RemoteIterator raises it.
    • toArray

      public static <T> T[] toArray(org.apache.hadoop.fs.RemoteIterator<T> source, T[] a) throws IOException
      Build an array from a RemoteIterator.
      Type Parameters:
      T - type
      Parameters:
      source - source iterator
      a - destination array; if too small a new array of the same type is created
      Returns:
      an array of the values.
      Throws:
      IOException - if the source RemoteIterator raises it.
    • foreach

      public static <T> long foreach(org.apache.hadoop.fs.RemoteIterator<T> source, org.apache.hadoop.util.functional.ConsumerRaisingIOE<? super T> consumer) throws IOException
      Apply an operation to all values of a RemoteIterator. If the iterator is an IOStatisticsSource returning a non-null set of statistics, and this classes log is set to DEBUG, then the statistics of the operation are evaluated and logged at debug.

      The number of entries processed is returned, as it is useful to know this, especially during tests or when reporting values to users.

      This does not close the iterator afterwards.
      Type Parameters:
      T - type of source
      Parameters:
      source - iterator source
      consumer - consumer of the values.
      Returns:
      the number of elements processed
      Throws:
      IOException - if the source RemoteIterator or the consumer raise one.
    • cleanupRemoteIterator

      public static <T> void cleanupRemoteIterator(org.apache.hadoop.fs.RemoteIterator<T> source)
      Clean up after an iteration. If the log is at debug, calculate and log the IOStatistics. If the iterator is closeable, cast and then cleanup the iterator
      Type Parameters:
      T - type of source
      Parameters:
      source - iterator source