This looks fairly involved. The main differences from the current signature version supported by JetS3t are the inclusion of HTTP headers (some signed) in the canonical request description, and algorithm & credential scope in the "string to sign".
See the documentation with pseudo-code 1 and the accompanying test suite resources 2 (it's nice there's a test suite, but worrying that it's necessary)
Initial work to support AWS Signature version 4, re #183
Just the bare bones are in place at this point, some supporting hashing
utility methods in ServiceUtils and utility methods (with tests) for
building the various components necessary to generate the version 4
signature in RestUtils.
add property setting to configure signature version and corresponding
switching logic in RestStorageService to apply the new or old
signature versions (this work is underway)
automatic SHA256 hashing of data pending upload, as is necessary to
sign PUT requests
see if there's a way to avoid needing pre-hashing of data pending
upload, which would break all streaming uses...
Man, they have made request signing super-complicated. I'm wading through the morass a little at a time, but I'm not optimistic I will be able to get version 4 signing working for all circumstances without some deep refactoring of JetS3t.
Yes, there seems to be a weird bug/quirk where that header is required or S3 uses the RFC822 format date header in the string to sign, contrary to the documentation, meaning the signature will never be correct.
Either I'm missing something, or this signature request version isn't fully baked, our maybe both...
@DavidD regarding the RequestTimeTooSkewed errors, which of the date values is correct. They should both be the same obviously – GMT/UTC times – but I assume it's the x-amz-date header which is off by an hour?
@DavidD If I look through enough JetS3t code I re-find all the strange issues I already fixed in the past – I think my last commit that explicitly sets the GMT timezone for the parser/formatter for the signature timestamps should fix that problem.
Since I'm currently in the GMT/UTC timezone myself I will be more prone to these kinds of bugs.
@DavidD To be honest I'm not sure how to handle the new requirement that you know a bucket's region in advance before you sign the request. I can't think of a clean way of handling it, or of automating this work in JetS3t so users don't need to worry about it.
A helper method like RestUtils.awsHostnameForRegion could make it easier to set the appropriate Host endpoint prior to sending requests, but having to set a different Host when the default s3.amazonaws.com endpoint previously worked seems like a nasty hack. Plus you would need an extra lookup request to find out the bucket's region before you even get that far. Can you even look up a bucket's region using virtual-host domains? You might need to do all such lookups using requests to the default region at s3.amazonaws.com with the bucket name in the path.
An alternative might be to check for errors in request signing due to an incorrect region, and then hack the request's Host to use the right region before sending a retry.
@James Murty I agree with your assessment. This very much looks like a chicken-and-egg problem. I cannot lookup the region of a bucket that is in eu-central-1 (or any future location that uses AWS4 signatures) without being able to sign with AWS4.
Fix and retry requests that fail due to wrong region in signature, re #183
This change permits requests that failed due to AWS version 4 signing
issues to succeed upon retry in two cases:
after request is redirected to appropriate region-specific endpoint
by S3 307 response, e.g. for object PUT to bucket in non-default region.
when S3 does not supply a redirect but instead returns an
"AuthorizationHeaderMalformed" error containing the expected Region
as part of the XML error response, in which case the request's Host
endpoint is adjusted to point to the appropriate region.
@DavidD Although I think it's a bit of an ugly hack, I have tried modifying a request's Host endpoint if S3 returns a region-specific signing error. When combined with another fix to re-authenticate requests that S3 redirects to the appropriate region endpoint, JetS3t should now handle most cases I can think of where the region is not known in advance.
There may be a lot of extra unnecessary request-then-retry cycles for buckets in regional locations if the service's endpoint cannot be known and set in advance, but I think that's the best that can be done.
See what you think, and see in particular the updated unit tests that show things working with the new "eu-central-1" region with no fore-knowledge of that region.
@James Murty I see, at least the known region is given in the 400 XML response. This is a great workaround. I stil have issues with HEAD requests failing with 400 and no XML message with the empty string hash.
HEAD/ff9d15c2-30cf-4182-956b-3424eee50467HTTP/1.1Date:Sun, 02 Nov 2014 07:45:28 GMTx-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855Host:test.cyberduck.ch.s3.amazonaws.comx-amz-date:20141102T074528ZAuthorization:AWS4-HMAC-SHA256 Credential=X/20141102/us-east-1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=247ddbd6855a2fda1332356d1f24367460f3977400b081264ece346dd8893d78Connection:Keep-AliveAccept-Encoding:gzip,deflateHTTP/1.1400 Bad Requestx-amz-request-id:176A6F8EA64A8988x-amz-id-2:x9sZl/bdj7G7kmrokNY1fdLcvbA7ihq4xuG4wA1gUr0JnK3rzvuDXzsdrGxEla/YContent-Type:application/xmlTransfer-Encoding:chunkedDate:Sun, 02 Nov 2014 07:45:28 GMTConnection:closeServer:AmazonS3
Demostrate minimal GET instead of HEAD when region unknown, re #183
HEAD requests to a target bucket with unknown region is likely to fail
with invalid signature when using AWS signature version 4 since you need
to know the region in advance. JetS3t cannot work around this problem
as it can for GET requests, as the error returned by S3 doesn't include
enough information to discover the region name that should be used.
This test case update shows how to use a minimal GET that retrieves only
a single byte (the least possible) as a technique that might be
acceptable instead of using unrecoverable HEAD requests.
@DavidD The HEAD issue has me beat, I have tried but could not find a way to discover the expected AWS region from the HEAD error message. Although the error response in this case has an XML document content-type, I cannot access it in JetS3t because either S3 doesn't provide an XML document, or HttpClient doesn't make it available. Which makes sense, given it is supposed to be body-less HEAD response after all...
The only work-around I can think of is to use a GET with zero byte-range instead, which in my testing limits the data returned to a single byte (see the test case update in the commit above)
@James Murty Thanks for your educated work on this. I just compared with the AWS CLI to find out about the issue and they require to always provide the region name with requests.
Would it be feasible to have a callback to a protected method in RestStorageService very much like isRecoverable403 to allow the client to determine the region and try again?
Also given the current expensive approach duplicating requests with a modified the Host header based on the first XML error response, would it be possible to cache this region information or allow the caller somehow to provide the region in subsequent requests.
@DavidD I have added a bucketName-to-region caching mechanisms to JetS3t as you suggested, as a further work-around for the HEAD request issue, and to greatly reduce the number of fix-then-retry cycles JetS3t will need to perform in normal use.
This all involves much more black-magic and fragile code than I would like, but I can't see a better way to keep fairly JetS3t useable without the (IMO) completely unreasonable requirement that users provide the region name with every request. Plus I really don't want to add an extra optional region parameter to every single operation in the library. It all seems like madness to me.
Let me know what you think. It would be great if you could also run tests on the caching mechanism and its logic for reverse-engineering a bucket name from a request, as there are bound to be bugs.
Cache bucket-to-region mappings to assist AWS version 4 signing, re #183
Add a whole lot of region-aware smarts to JetS3t to try and hide the
fact the AWS signature version 4 requires much more fore-knowledge of
bucket-to-region mappings than is reasonable. All this work is verging
much more towards magic than I'm happy with and is sure to lead to
strange issues, but the alternative is dire for users.
RestStorageService now uses the new RegionEndpointCache to cache
known mappings from bucket names to regions. These mappings could be
inserted manually, and are also added automatically by JetS3t when it
is able to learn about bucket-to-region mappings (such as from prior
requests to the same bucket using a region-specific Host, or after
requests that fail with a clear region-related region where the correct
region is supplied by S3).
This all means that HEAD requests to a bucket in an unknown region can
succeed for the first time, provided the bucket has a mapping in
@James Murty I would still like to see a callback from an initial 400 InvalidRequest error response (The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.) that would allow me the caller to switch to AWS4-HMAC-SHA256 only when accessing buckets in eu-central-1. That would allow me to support eu-central-1 without switching the default signature version just yet as a conservative measure in a bugfix release.
Add automatic switching to AWS version 4 signatures if necessary, re #183
If a service not configured to use AWS4-HMAC-SHA256 signatures sends a
request to an endpoint that requires these signatures -- as indicated
by a 400 InvalidRequest error with a message containing "Please use
AWS4-HMAC-SHA256" -- JetS3t automatically retries the request using the
necessary signing mechanism.
Also, a service that uses legacy (AWS2) signing by default will check
the new bucketName-to-region cache before signing requests and will
switch to AWS4-HMAC-SHA256 if a region is mapped for a request's
All of which means that JetS3t should mostly just work for people using
buckets in regions that require AWS4-HMAC-SHA256 signatures, like
eu-central-1, without the need to change the default signing mechanism
used by the service.
NOTE: It is still preferable to configure the service to use
AWS4-HMAC-SHA256 signatures if feasible, since JetS3t's automatic retry
mechanism will take more retry request to succeed.
@DavidD Thanks for the kudos, and confirmation things are mostly working okay with AWS4-HMAC-SHA256. Regarding handling the 400 InvalidRequest case smoothly, I think I have a reasonable fix in the latest commit that will permit things to work fairly invisibly even if a service is configured to use the legacy signature method by default.
Are there test cases for multipart uploads available? Just wanted to note that mine do not cover the default behavior in the framework as I use custom HTTP entities with my own x-amz-content-sha256 hash computation.
Yes, test/org/jets3t/service/TestRestS3Service.java has a set of test cases for multipart uploads. They're reasonable, though far from definitive as there's no way I can practically run tests that upload gigabytes, or even dozens of megabytes, over my slow upstream connection.
To try and manage this complexity and give me a fighting chance of
getting this right, the code to determine a request's bucket name is now
in the ServiceUtils class where it can be easily accessed for tests, and
is heavily unit-tested.
Use better bucket name lookup when handling request redirects, re #183
Fix long-standing hack where requests weren't re-authorized after a 307
redirect as there was no good mechanism to tease out the exact bucket
name for the legacy signature process when the Host was altered to one
of the S3 regional variants, e.g. s3-us-east-1.amazonaws.com