Class NativeAzureFileSystem

All Implemented Interfaces:
Closeable, AutoCloseable, Configurable, BulkDeleteSource, org.apache.hadoop.fs.PathCapabilities, org.apache.hadoop.security.token.DelegationTokenIssuer

@Public @Stable public class NativeAzureFileSystem extends FileSystem
A FileSystem for reading and writing files stored on Windows Azure. This implementation is blob-based and stores files on Azure in their native form so they can be read by other Azure tools.
  • Field Details

    • LOG

      public static final org.slf4j.Logger LOG
    • AZURE_CHOWN_USERLIST_PROPERTY_NAME

      public static final String AZURE_CHOWN_USERLIST_PROPERTY_NAME
      Configuration property used to specify list of users that can perform chown operation when authorization is enabled in WASB.
      See Also:
    • AZURE_DAEMON_USERLIST_PROPERTY_NAME

      public static final String AZURE_DAEMON_USERLIST_PROPERTY_NAME
      Configuration property used to specify list of daemon users that can perform chmod operation when authorization is enabled in WASB.
      See Also:
    • AZURE_CHMOD_USERLIST_PROPERTY_NAME

      public static final String AZURE_CHMOD_USERLIST_PROPERTY_NAME
      Configuration property used to specify list of users that can perform chmod operation when authorization is enabled in WASB.
      See Also:
    • SKIP_AZURE_METRICS_PROPERTY_NAME

      public static final String SKIP_AZURE_METRICS_PROPERTY_NAME
      See Also:
    • APPEND_SUPPORT_ENABLE_PROPERTY_NAME

      public static final String APPEND_SUPPORT_ENABLE_PROPERTY_NAME
      See Also:
    • RETURN_URI_AS_CANONICAL_SERVICE_NAME_PROPERTY_NAME

      public static final String RETURN_URI_AS_CANONICAL_SERVICE_NAME_PROPERTY_NAME
      See Also:
    • AZURE_RENAME_THREADS

      public static final String AZURE_RENAME_THREADS
      The configuration property to set number of threads to be used for rename operation.
      See Also:
    • DEFAULT_AZURE_RENAME_THREADS

      public static final int DEFAULT_AZURE_RENAME_THREADS
      The default number of threads to be used for rename operation.
      See Also:
    • AZURE_DELETE_THREADS

      public static final String AZURE_DELETE_THREADS
      The configuration property to set number of threads to be used for delete operation.
      See Also:
    • DEFAULT_AZURE_DELETE_THREADS

      public static final int DEFAULT_AZURE_DELETE_THREADS
      The default number of threads to be used for delete operation.
      See Also:
    • KEY_AZURE_AUTHORIZATION

      public static final String KEY_AZURE_AUTHORIZATION
      Configuration key to enable authorization support in WASB.
      See Also:
  • Constructor Details

    • NativeAzureFileSystem

      public NativeAzureFileSystem()
    • NativeAzureFileSystem

      public NativeAzureFileSystem(org.apache.hadoop.fs.azure.NativeFileSystemStore store)
  • Method Details

    • getScheme

      public String getScheme()
      Description copied from class: FileSystem
      Return the protocol scheme for this FileSystem.

      This implementation throws an UnsupportedOperationException.

      Overrides:
      getScheme in class FileSystem
      Returns:
      the protocol scheme for this FileSystem.
    • getCanonicalServiceName

      public String getCanonicalServiceName()
      If fs.azure.override.canonical.service.name is set as true, return URI of the WASB filesystem, otherwise use the default implementation.
      Specified by:
      getCanonicalServiceName in interface org.apache.hadoop.security.token.DelegationTokenIssuer
      Overrides:
      getCanonicalServiceName in class FileSystem
      Returns:
      a service string that uniquely identifies this file system
      See Also:
    • newMetricsSourceName

      @VisibleForTesting public static String newMetricsSourceName()
      Creates a new metrics source name that's unique within this process.
      Returns:
      metric source name
    • checkPath

      protected void checkPath(Path path)
      Description copied from class: FileSystem
      Check that a Path belongs to this FileSystem. The base implementation performs case insensitive equality checks of the URIs' schemes and authorities. Subclasses may implement slightly different checks.
      Overrides:
      checkPath in class FileSystem
      Parameters:
      path - to check
    • initialize

      public void initialize(URI uri, Configuration conf) throws IOException, IllegalArgumentException
      Description copied from class: FileSystem
      Initialize a FileSystem. Called after the new FileSystem instance is constructed, and before it is ready for use. FileSystem implementations overriding this method MUST forward it to their superclass, though the order in which it is done, and whether to alter the configuration before the invocation are options of the subclass.
      Overrides:
      initialize in class FileSystem
      Parameters:
      uri - a URI whose authority section names the host, port, etc. for this FileSystem
      conf - the configuration
      Throws:
      IOException - on any failure to initialize this instance.
      IllegalArgumentException - if the URI is considered invalid.
    • getHomeDirectory

      public Path getHomeDirectory()
      Description copied from class: FileSystem
      Return the current user's home directory in this FileSystem. The default implementation returns "/user/$USER/".
      Overrides:
      getHomeDirectory in class FileSystem
      Returns:
      the path.
    • updateWasbAuthorizer

      @VisibleForTesting public void updateWasbAuthorizer(org.apache.hadoop.fs.azure.WasbAuthorizerInterface authorizer)
    • pathToKey

      @VisibleForTesting public String pathToKey(Path path)
      Convert the path to a key. By convention, any leading or trailing slash is removed, except for the special case of a single slash.
      Parameters:
      path - path converted to a key
      Returns:
      key string
    • makeAbsolute

      @VisibleForTesting public Path makeAbsolute(Path path)
      Get the absolute version of the path (fully qualified). This is public for testing purposes.
      Parameters:
      path - path to be absolute path.
      Returns:
      fully qualified path
    • getStore

      @VisibleForTesting public org.apache.hadoop.fs.azure.AzureNativeFileSystemStore getStore()
      For unit test purposes, retrieves the AzureNativeFileSystemStore store backing this file system.
      Returns:
      The store object.
    • getInstrumentation

      public AzureFileSystemInstrumentation getInstrumentation()
      Gets the metrics source for this file system. This is mainly here for unit testing purposes.
      Returns:
      the metrics source.
    • append

      public FSDataOutputStream append(Path f, int bufferSize, Progressable progress) throws IOException
      This optional operation is not yet supported.
      Specified by:
      append in class FileSystem
      Parameters:
      f - the existing file to be appended.
      bufferSize - the size of the buffer to be used.
      progress - for reporting progress if it is not null.
      Returns:
      output stream.
      Throws:
      IOException - IO failure
    • create

      public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException
      Description copied from class: FileSystem
      Create an FSDataOutputStream at the indicated Path with write-progress reporting.
      Specified by:
      create in class FileSystem
      Parameters:
      f - the file name to open
      permission - file permission
      overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
      bufferSize - the size of the buffer to be used.
      replication - required block replication for the file.
      blockSize - block size
      progress - the progress reporter
      Returns:
      output stream.
      Throws:
      IOException - IO failure
      See Also:
    • acquireLease

      public org.apache.hadoop.fs.azure.SelfRenewingLease acquireLease(Path path) throws AzureException
      Get a self-renewing lease on the specified file.
      Parameters:
      path - path whose lease to be renewed.
      Returns:
      Lease
      Throws:
      AzureException - when not being able to acquire a lease on the path
    • createNonRecursive

      public FSDataOutputStream createNonRecursive(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException
      Description copied from class: FileSystem
      Opens an FSDataOutputStream at the indicated Path with write-progress reporting. Same as create(), except fails if parent directory doesn't already exist.
      Overrides:
      createNonRecursive in class FileSystem
      Parameters:
      f - the file name to open
      permission - file permission
      overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
      bufferSize - the size of the buffer to be used.
      replication - required block replication for the file.
      blockSize - block size
      progress - the progress reporter
      Returns:
      output stream.
      Throws:
      IOException - IO failure
      See Also:
    • createNonRecursive

      public FSDataOutputStream createNonRecursive(Path f, FsPermission permission, EnumSet<CreateFlag> flags, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException
      Description copied from class: FileSystem
      Opens an FSDataOutputStream at the indicated Path with write-progress reporting. Same as create(), except fails if parent directory doesn't already exist.
      Overrides:
      createNonRecursive in class FileSystem
      Parameters:
      f - the file name to open
      permission - file permission
      flags - CreateFlags to use for this stream.
      bufferSize - the size of the buffer to be used.
      replication - required block replication for the file.
      blockSize - block size
      progress - the progress reporter
      Returns:
      output stream.
      Throws:
      IOException - IO failure
      See Also:
    • createNonRecursive

      public FSDataOutputStream createNonRecursive(Path f, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException
      Description copied from class: FileSystem
      Opens an FSDataOutputStream at the indicated Path with write-progress reporting. Same as create(), except fails if parent directory doesn't already exist.
      Overrides:
      createNonRecursive in class FileSystem
      Parameters:
      f - the file name to open
      overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
      bufferSize - the size of the buffer to be used.
      replication - required block replication for the file.
      blockSize - block size
      progress - the progress reporter
      Returns:
      output stream.
      Throws:
      IOException - IO failure
      See Also:
    • createInternal

      protected FSDataOutputStream createInternal(Path f, FsPermission permission, boolean overwrite, org.apache.hadoop.fs.azure.SelfRenewingLease parentFolderLease) throws FileAlreadyExistsException, IOException
      This is the version of the create call that is meant for internal usage. This version is not public facing and does not perform authorization checks. It is used by the public facing create call and by FolderRenamePending to create the internal -RenamePending.json file.
      Parameters:
      f - the path to a file to be created.
      permission - for the newly created file.
      overwrite - specifies if the file should be overwritten.
      parentFolderLease - lease on the parent folder.
      Returns:
      the output stream used to write data into the newly created file .
      Throws:
      IOException - if an IO error occurs while attempting to delete the path.
      FileAlreadyExistsException
    • delete

      @Deprecated public boolean delete(Path path) throws IOException
      Deprecated.
      Description copied from class: FileSystem
      Delete a file/directory.
      Overrides:
      delete in class FileSystem
      Parameters:
      path - the path.
      Returns:
      if delete success true, not false.
      Throws:
      IOException - IO failure.
    • delete

      public boolean delete(Path f, boolean recursive) throws IOException
      Description copied from class: FileSystem
      Delete a file.
      Specified by:
      delete in class FileSystem
      Parameters:
      f - the path to delete.
      recursive - if path is a directory and set to true, the directory is deleted else throws an exception. In case of a file the recursive can be set to either true or false.
      Returns:
      true if delete is successful else false.
      Throws:
      IOException - IO failure
    • delete

      public boolean delete(Path f, boolean recursive, boolean skipParentFolderLastModifiedTimeUpdate) throws IOException
      Delete the specified file or folder. The parameter skipParentFolderLastModifiedTimeUpdate is used in the case of atomic folder rename redo. In that case, there is a lease on the parent folder, so (without reworking the code) modifying the parent folder update time will fail because of a conflict with the lease. Since we are going to delete the folder soon anyway so accurate modified time is not necessary, it's easier to just skip the modified time update.
      Parameters:
      f - file path to be deleted.
      recursive - specify deleting recursively or not.
      skipParentFolderLastModifiedTimeUpdate - If true, don't update the folder last modified time.
      Returns:
      true if and only if the file is deleted
      Throws:
      IOException - Thrown when fail to delete file or directory.
    • getThreadPoolExecutor

      public org.apache.hadoop.fs.azure.AzureFileSystemThreadPoolExecutor getThreadPoolExecutor(int threadCount, String threadNamePrefix, String operation, String key, String config)
    • getFileStatus

      public FileStatus getFileStatus(Path f) throws FileNotFoundException, IOException
      Description copied from class: FileSystem
      Return a file status object that represents the path.
      Specified by:
      getFileStatus in class FileSystem
      Parameters:
      f - The path we want information from
      Returns:
      a FileStatus object
      Throws:
      FileNotFoundException - when the path does not exist
      IOException - see specific implementation
    • existsInternal

      protected boolean existsInternal(Path f) throws IOException
      Checks if a given path exists in the filesystem. Calls getFileStatusInternal and has the same costs as the public facing exists call. This internal version of the exists call does not perform authorization checks, and is used internally by various filesystem operations that need to check if the parent/ancestor/path exist. The idea is to avoid having to configure authorization policies for these internal calls.
      Parameters:
      f - the path to a file or directory.
      Returns:
      true if path exists; otherwise false.
      Throws:
      IOException - if an IO error occurs while attempting to check for existence of the path.
    • getUri

      public URI getUri()
      Description copied from class: FileSystem
      Returns a URI which identifies this FileSystem.
      Specified by:
      getUri in class FileSystem
      Returns:
      the URI of this filesystem.
    • listStatus

      public FileStatus[] listStatus(Path f) throws FileNotFoundException, IOException
      Retrieve the status of a given path if it is a file, or of all the contained files if it is a directory.
      Specified by:
      listStatus in class FileSystem
      Parameters:
      f - given path
      Returns:
      the statuses of the files/directories in the given patch
      Throws:
      FileNotFoundException - when the path does not exist
      IOException - see specific implementation
    • mkdirs

      public boolean mkdirs(Path f, FsPermission permission) throws IOException
      Description copied from class: FileSystem
      Make the given file and all non-existent parents into directories. Has roughly the semantics of Unix @{code mkdir -p}. Existence of the directory hierarchy is not an error.
      Specified by:
      mkdirs in class FileSystem
      Parameters:
      f - path to create
      permission - to apply to f
      Returns:
      if mkdir success true, not false.
      Throws:
      IOException - IO failure
    • mkdirs

      public boolean mkdirs(Path f, FsPermission permission, boolean noUmask) throws IOException
      Throws:
      IOException
    • open

      public FSDataInputStream open(Path f, int bufferSize) throws FileNotFoundException, IOException
      Description copied from class: FileSystem
      Opens an FSDataInputStream at the indicated Path.
      Specified by:
      open in class FileSystem
      Parameters:
      f - the file name to open
      bufferSize - the size of the buffer to be used.
      Returns:
      input stream.
      Throws:
      IOException - IO failure
      FileNotFoundException
    • openFileWithOptions

      protected CompletableFuture<FSDataInputStream> openFileWithOptions(Path path, org.apache.hadoop.fs.impl.OpenFileParameters parameters) throws IOException
      Description copied from class: FileSystem
      Execute the actual open file operation. This is invoked from FSDataInputStreamBuilder.build() and from DelegateToFileSystem and is where the action of opening the file should begin. The base implementation performs a blocking call to FileSystem.open(Path, int) in this call; the actual outcome is in the returned CompletableFuture. This avoids having to create some thread pool, while still setting up the expectation that the get() call is needed to evaluate the result.
      Overrides:
      openFileWithOptions in class FileSystem
      Parameters:
      path - path to the file
      parameters - open file parameters from the builder.
      Returns:
      a future which will evaluate to the opened file.
      Throws:
      IOException - failure to resolve the link.
    • rename

      public boolean rename(Path src, Path dst) throws FileNotFoundException, IOException
      Description copied from class: FileSystem
      Renames Path src to Path dst.
      Specified by:
      rename in class FileSystem
      Parameters:
      src - path to be renamed
      dst - new path after rename
      Returns:
      true if rename is successful
      Throws:
      IOException - on failure
      FileNotFoundException
    • setWorkingDirectory

      public void setWorkingDirectory(Path newDir)
      Set the working directory to the given directory.
      Specified by:
      setWorkingDirectory in class FileSystem
      Parameters:
      newDir - Path of new working directory
    • getWorkingDirectory

      public Path getWorkingDirectory()
      Description copied from class: FileSystem
      Get the current working directory for the given FileSystem
      Specified by:
      getWorkingDirectory in class FileSystem
      Returns:
      the directory pathname
    • setPermission

      public void setPermission(Path p, FsPermission permission) throws FileNotFoundException, IOException
      Description copied from class: FileSystem
      Set permission of a path.
      Overrides:
      setPermission in class FileSystem
      Parameters:
      p - The path
      permission - permission
      Throws:
      IOException - IO failure
      FileNotFoundException
    • setOwner

      public void setOwner(Path p, String username, String groupname) throws IOException
      Description copied from class: FileSystem
      Set owner of a path (i.e. a file or a directory). The parameters username and groupname cannot both be null.
      Overrides:
      setOwner in class FileSystem
      Parameters:
      p - The path
      username - If it is null, the original username remains unchanged.
      groupname - If it is null, the original groupname remains unchanged.
      Throws:
      IOException - IO failure
    • setXAttr

      public void setXAttr(Path path, String xAttrName, byte[] value, EnumSet<XAttrSetFlag> flag) throws IOException
      Set the value of an attribute for a path.
      Overrides:
      setXAttr in class FileSystem
      Parameters:
      path - The path on which to set the attribute
      xAttrName - The attribute to set
      value - The byte value of the attribute to set (encoded in utf-8)
      flag - The mode in which to set the attribute
      Throws:
      IOException - If there was an issue setting the attribute on Azure
    • getXAttr

      public byte[] getXAttr(Path path, String xAttrName) throws IOException
      Get the value of an attribute for a path.
      Overrides:
      getXAttr in class FileSystem
      Parameters:
      path - The path on which to get the attribute
      xAttrName - The attribute to get
      Returns:
      The bytes of the attribute's value (encoded in utf-8) or null if the attribute does not exist
      Throws:
      IOException - If there was an issue getting the attribute from Azure
    • close

      public void close() throws IOException
      Description copied from class: FileSystem
      Close this FileSystem instance. Will release any held locks, delete all files queued for deletion through calls to FileSystem.deleteOnExit(Path), and remove this FS instance from the cache, if cached. After this operation, the outcome of any method call on this FileSystem instance, or any input/output stream created by it is undefined.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class FileSystem
      Throws:
      IOException - IO failure
    • getDelegationToken

      public Token<?> getDelegationToken(String renewer) throws IOException
      Get a delegation token from remote service endpoint if 'fs.azure.enable.kerberos.support' is set to 'true'.
      Specified by:
      getDelegationToken in interface org.apache.hadoop.security.token.DelegationTokenIssuer
      Parameters:
      renewer - the account name that is allowed to renew the token.
      Returns:
      delegation token
      Throws:
      IOException - thrown when getting the current user.
    • access

      public void access(Path path, FsAction mode) throws IOException
      Description copied from class: FileSystem
      Checks if the user can access a path. The mode specifies which access checks to perform. If the requested permissions are granted, then the method returns normally. If access is denied, then the method throws an AccessControlException.

      The default implementation calls FileSystem.getFileStatus(Path) and checks the returned permissions against the requested permissions. Note that the FileSystem.getFileStatus(Path) call will be subject to authorization checks. Typically, this requires search (execute) permissions on each directory in the path's prefix, but this is implementation-defined. Any file system that provides a richer authorization model (such as ACLs) may override the default implementation so that it checks against that model instead.

      In general, applications should avoid using this method, due to the risk of time-of-check/time-of-use race conditions. The permissions on a file may change immediately after the access call returns. Most applications should prefer running specific file system actions as the desired user represented by a UserGroupInformation.

      Parameters:
      path - Path to check
      mode - type of access to check
      Throws:
      AccessControlException - if access is denied
      FileNotFoundException - if the path does not exist
      IOException - see specific implementation
    • recoverFilesWithDanglingTempData

      public void recoverFilesWithDanglingTempData(Path root, Path destination) throws IOException
      Looks under the given root path for any blob that are left "dangling", meaning that they are place-holder blobs that we created while we upload the data to a temporary blob, but for some reason we crashed in the middle of the upload and left them there. If any are found, we move them to the destination given.
      Parameters:
      root - The root path to consider.
      destination - The destination path to move any recovered files to.
      Throws:
      IOException - Thrown when fail to recover files.
    • deleteFilesWithDanglingTempData

      public void deleteFilesWithDanglingTempData(Path root) throws IOException
      Looks under the given root path for any blob that are left "dangling", meaning that they are place-holder blobs that we created while we upload the data to a temporary blob, but for some reason we crashed in the middle of the upload and left them there. If any are found, we delete them.
      Parameters:
      root - The root path to consider.
      Throws:
      IOException - Thrown when fail to delete.
    • finalize

      protected void finalize() throws Throwable
      Overrides:
      finalize in class Object
      Throws:
      Throwable
    • getOwnerForPath

      @VisibleForTesting public String getOwnerForPath(Path absolutePath) throws IOException
      Throws:
      IOException
    • hasPathCapability

      public boolean hasPathCapability(Path path, String capability) throws IOException
      Description copied from class: FileSystem
      The base FileSystem implementation generally has no knowledge of the capabilities of actual implementations. Unless it has a way to explicitly determine the capabilities, this method returns false. Probe for a specific capability under the given path. If the function returns true, this instance is explicitly declaring that the capability is available. If the function returns false, it can mean one of:
      • The capability is not known.
      • The capability is known but it is not supported.
      • The capability is known but the filesystem does not know if it is supported under the supplied path.
      The core guarantee which a caller can rely on is: if the predicate returns true, then the specific operation/behavior can be expected to be supported. However a specific call may be rejected for permission reasons, the actual file/directory not being present, or some other failure during the attempted execution of the operation.

      Implementors: PathCapabilitiesSupport can be used to help implement this method.

      Specified by:
      hasPathCapability in interface org.apache.hadoop.fs.PathCapabilities
      Overrides:
      hasPathCapability in class FileSystem
      Parameters:
      path - path to query the capability of.
      capability - non-null, non-empty string to query the path for support.
      Returns:
      true if the capability is supported under that part of the FS.
      Throws:
      IOException - this should not be raised, except on problems resolving paths or relaying the call.