This document explains the upcoming work for upgrading S3A to AWS SDK V2. This work is tracked in HADOOP-18073.
The SDK V2 for S3 is very different from SDK V1, and brings breaking changes for S3A. A complete list of the changes can be found in the Changelog.
aws-java-sdk-bundle-1.12.x.jar
becomes bundle-2.x.y.jar
As the module name is lost, in hadoop releases a large JAR file with the name “bundle” is now part of the distribution. This is the AWS V2 SDK shaded artifact.
The new and old SDKs can co-exist; the only place that the hadoop code may still use the original SDK is when a non-standard V1 AWS credential provider is declared.
Any deployment of the S3A connector must include this JAR or the subset of non-shaded aws- JARs needed for communication with S3 and any other services used. As before: the exact set of dependencies used by the S3A connector is neither defined nor comes with any commitments of stability or compatibility of dependent libraries.
The change in interface will mean that custom credential providers will need to be updated to now implement software.amazon.awssdk.auth.credentials.AwsCredentialsProvider
instead of com.amazonaws.auth.AWSCredentialsProvider
.
HADOOP-18980 introduces extended version of the credential provider remapping. fs.s3a.aws.credentials.provider.mapping
can be used to list comma-separated key-value pairs of mapped credential providers that are separated by equal operator (=). The key can be used by fs.s3a.aws.credentials.provider
or fs.s3a.assumed.role.credentials.provider
configs, and the key will be translated into the specified value of credential provider class based on the key-value pair provided by the config fs.s3a.aws.credentials.provider.mapping
.
For example, if fs.s3a.aws.credentials.provider.mapping
is set with value:
<property> <name>fs.s3a.aws.credentials.provider.mapping</name> <vale> com.amazonaws.auth.AnonymousAWSCredentials=org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider, com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper=org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider, com.amazonaws.auth.InstanceProfileCredentialsProvider=org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider </vale> </property>
and if fs.s3a.aws.credentials.provider
is set with:
<property> <name>fs.s3a.aws.credentials.provider</name> <vale>com.amazonaws.auth.AnonymousAWSCredentials</vale> </property>
com.amazonaws.auth.AnonymousAWSCredentials
will be internally remapped to org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider
by S3A while preparing the AWS credential provider list.
Similarly, if fs.s3a.assumed.role.credentials.provider
is set with:
<property> <name>fs.s3a.assumed.role.credentials.provider</name> <vale>com.amazonaws.auth.InstanceProfileCredentialsProvider</vale> </property>
com.amazonaws.auth.InstanceProfileCredentialsProvider
will be internally remapped to org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider
by S3A while preparing the assumed role AWS credential provider list.
AWSCredentialsProvider
interfaceNote how the interface begins with the capitalized “AWS” acronym. The V2 interface starts with “Aws”. This is a very subtle change for developers to spot. Compilers will detect and report the type mismatch.
package com.amazonaws.auth; public interface AWSCredentialsProvider { public AWSCredentials getCredentials(); public void refresh(); }
The interface binding also supported a factory method, AWSCredentialsProvider instance()
which, if available, would be invoked in preference to using any constructor.
If the interface implemented Closeable
or AutoCloseable
, these would be invoked when the provider chain was being shut down.
AwsCredentialsProvider
interfacepackage software.amazon.awssdk.auth.credentials; public interface AwsCredentialsProvider { AwsCredentials resolveCredentials(); }
refresh()
method any more.getCredentials()
has become resolveCredentials()
.resolveCredentials()
.Closeable
or AutoCloseable
, these will be invoked when the provider chain is being shut down.create()
which returns an AwsCredentialsProvider
or subclass; this will be used in preference to a constructorAWSCredentialProviderList
is now a V2 credential providerThe class org.apache.hadoop.fs.s3a.AWSCredentialProviderList
has moved from being a V1 to a V2 credential provider; even if an instance can be created with existing code, the V1 methods will not resolve:
java.lang.NoSuchMethodError: org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials()Lcom/amazonaws/auth/AWSCredentials; at org.apache.hadoop.fs.store.diag.S3ADiagnosticsInfo.validateFilesystem(S3ADiagnosticsInfo.java:903)
fs.s3a.aws.credentials.provider
Before: fs.s3a.aws.credentials.provider
took a list of v1 credential providers, This took a list containing 1. V1 credential providers implemented in the hadoop-aws
module. 2. V1 credential providers implemented in the aws-sdk-bundle
library. 3. Custom V1 credential providers placed onto the classpath. 4. Custom subclasses of hadoop-aws credential providers.
And here is how they change 1. All hadoop-aws
credential providers migrated to V2. 2. Well-known aws-sdk-bundle
credential providers automatically remapped to their V2 equivalents. 3. Custom v1 providers supported if the original aws-sdk-bundle
JAR is on the classpath. 4. Custom subclasses of hadoop-aws credential providers need manual migration.
Because of (1) and (2), As result, standard fs.s3a.aws.credentials.provider
configurations should seamlessly upgrade. This also means that the same provider list, if restricted to those classes, will work across versions.
hadoop-aws
credential providers migration to V2All the fs.s3a credential providers have the same name and functionality as before.
Hadoop module credential provider | Authentication Mechanism |
---|---|
org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider |
Session Credentials in configuration |
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
Simple name/secret credentials in configuration |
org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider |
Anonymous Login |
org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider |
Assumed Role credentials |
org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider |
EC2/k8s instance credentials |
aws-sdk-bundle
credential provider remappingThe commonly-used set of V1 credential providers are automatically remapped to V2 equivalents.
V1 Credential Provider | Remapped V2 substitute |
---|---|
com.amazonaws.auth.AnonymousAWSCredentials |
org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider |
com.amazonaws.auth.EnvironmentVariableCredentialsProvider |
software.amazon.awssdk.auth.credentials.EnvironmentVariableCredentialsProvider |
com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper |
org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider |
com.amazonaws.auth.InstanceProfileCredentialsProvider |
org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider |
com.amazonaws.auth.profile.ProfileCredentialsProvider |
software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider |
There are still a number of troublespots here:
com.amazonaws.auth.
AWS providersThere should be equivalents in the new SDK, but as well as being renamed they are likely to have moved different factory/builder mechanisms. Identify the changed classes and use their names in the fs.s3a.aws.credentials.provider
option.
If a V2 equivalent is not found; provided the V1 SDK is added to the classpath, it should still be possible to use the existing classes.
Provided the V1 SDK is added to the classpath, it should still be possible to use the existing classes.
Adding a V2 equivalent is the recommended long-term solution.
Because all the standard hadoop credential providers have been upgraded, any subclasses of these are not going to link or work.
These will need to be manually migrated to being V2 Credential providers.
The major changes and how this affects S3A are listed below.
software.amazon.awssdk
, SDK V1 classes were under com.amazonaws
.org.apache.hadoop.fs.s3a.adapter
.Serializable
.Most of these changes simply create what will feel to be gratuitous migration effort; the removable of the Serializable
nature from all message response classes can potentially break applications -such as anything passing them between Spark workers. See AWS SDK V2 issue Simplify Modeled Message Marshalling #82, note that it was filed in 2017, then implement your own workaround pending that issue being resolved.
Any code making use of V1 sdk classes will fail if they * Expect the V1 sdk classes to be on the classpath when hadoop-aws
is declared as a dependency * Use V1-SDK-compatible methods previously exported by the S3AFileSystem
class and associated classes. * Try to pass s3a classes to V1 SDK classes (e.g. credential providers).
The sole solution to these problems is “move to the V2 SDK”.
Some S3AUtils
methods are deleted
cannot find symbol [ERROR] symbol: method createAwsConf(org.apache.hadoop.conf.Configuration,java.lang.String) [ERROR] location: class org.apache.hadoop.fs.s3a.S3AUtils
The signature and superclass of AWSCredentialProviderList
has changed, which can surface in different ways
Signature mismatch
cannot find symbol [ERROR] symbol: method getCredentials() [ERROR] location: variable credentials of type org.apache.hadoop.fs.s3a.AWSCredentialProviderList
It is no longer a V1 credential provider, cannot be used to pass credentials to a V1 SDK class
incompatible types: org.apache.hadoop.fs.s3a.AWSCredentialProviderList cannot be converted to com.amazonaws.auth.AWSCredentialsProvider
AmazonS3
replaced by S3Client
; factory and accessor changed.The V1 s3 client class com.amazonaws.services.s3.AmazonS3
has been superseded by software.amazon.awssdk.services.s3.S3Client
The S3ClientFactory
interface has been replaced by one that creates a V2 S3Client
. * Custom implementations will need to be updated. * The InconsistentS3ClientFactory
class has been deleted.
S3AFileSystem
method changes: S3AInternals
.The low-level s3 operations/client accessors have been moved into a new interface, org.apache.hadoop.fs.s3a.S3AInternals
, which must be accessed via the S3AFileSystem.getS3AInternals()
method. They have also been updated to return V2 SDK classes.
@InterfaceStability.Unstable @InterfaceAudience.LimitedPrivate("testing/diagnostics") public interface S3AInternals { S3Client getAmazonS3V2Client(String reason); S3AStore getStore(); @Retries.RetryTranslated @AuditEntryPoint String getBucketLocation() throws IOException; @AuditEntryPoint @Retries.RetryTranslated String getBucketLocation(String bucketName) throws IOException; @AuditEntryPoint @Retries.RetryTranslated HeadObjectResponse getObjectMetadata(Path path) throws IOException; AWSCredentialProviderList shareCredentials(final String purpose); @AuditEntryPoint @Retries.RetryTranslated HeadBucketResponse getBucketMetadata() throws IOException; boolean isMultipartCopyEnabled(); @AuditEntryPoint @Retries.RetryTranslated long abortMultipartUploads(Path path) throws IOException; }
S3AFileSystem.getAmazonS3ClientForTesting(String)
moved and return type changedThe S3AFileSystem.getAmazonS3ClientForTesting()
method has been been deleted.
Compilation
cannot find symbol [ERROR] symbol: method getAmazonS3ClientForTesting(java.lang.String) [ERROR] location: variable fs of type org.apache.hadoop.fs.s3a.S3AFileSystem
It has been replaced by an S3AInternals
equivalent which returns the V2 S3Client
of the filesystem instance.
((S3AFilesystem)fs).getAmazonS3ClientForTesting("testing")
((S3AFilesystem)fs).getS3AInternals().getAmazonS3Client("testing")
S3AFileSystem.getObjectMetadata(Path path)
moved to S3AInternals
; return type changedThe getObjectMetadata(Path)
call has been moved to the S3AInternals
interface and an instance of the software.amazon.awssdk.services.s3.model.HeadObjectResponse
class returned. The original S3AFileSystem
method has been deleted
Before:
((S3AFilesystem)fs).getObjectMetadata(path)
After:
((S3AFilesystem)fs).getS3AInternals().getObjectMetadata(path)
AWSCredentialProviderList shareCredentials(String)
moved to S3AInternals
The operation to share a reference-counted access to the AWS credentials used by the S3A FS has been moved to S3AInternals
.
This is very much an implementation method, used to allow extension modules to share an authentication chain into other AWS SDK client services (dynamoDB, etc.).
AWSCredentialProviderList
has been upgraded to the V2 API. * It still retains a refresh()
method but this is now a deprecated no-op. * It is still Closeable
; its close()
method iterates through all entries in the list; if they are Closeable
or AutoCloseable
then their close()
method is invoked. * Accordingly, providers may still perform background refreshes in separate threads; the S3A client will close its provider list when the filesystem itself is closed.
Interface change: com.amazonaws.auth.Signer has been replaced by software.amazon.awssdk.core.signer.Signer.
The change in signers will mean the custom signers will need to be updated to implement the new interface.
There is no support to assist in this migration.
The callbacks from the SDK have all changed, as has the interface org.apache.hadoop.fs.s3a.audit.AWSAuditEventCallbacks
Examine the interface and associated implementations to see how to migrate.
The option fs.s3a.audit.request.handlers
to declare a list of v1 SDK com.amazonaws.handlers.RequestHandler2
implementations to include in the AWS request chain is no longer supported: a warning is printed and the value ignored.
The V2 SDK equivalent, classes implementing software.amazon.awssdk.core.interceptor.ExecutionInterceptor
can be declared in the configuration option fs.s3a.audit.execution.interceptors
.