Support for AWS Signature Version 4

Issue #183 resolved
DavidD
created an issue

No description provided.

Comments (47)

  1. James Murty repo owner

    This looks fairly involved. The main differences from the current signature version supported by JetS3t are the inclusion of HTTP headers (some signed) in the canonical request description, and algorithm & credential scope in the "string to sign".

    See the documentation with pseudo-code 1 and the accompanying test suite resources 2 (it's nice there's a test suite, but worrying that it's necessary)

  2. James Murty repo owner

    Initial work to support AWS Signature version 4, re #183

    Just the bare bones are in place at this point, some supporting hashing utility methods in ServiceUtils and utility methods (with tests) for building the various components necessary to generate the version 4 signature in RestUtils.

    Pending work:

    • add property setting to configure signature version and corresponding switching logic in RestStorageService to apply the new or old signature versions (this work is underway)
    • automatic SHA256 hashing of data pending upload, as is necessary to sign PUT requests
    • see if there's a way to avoid needing pre-hashing of data pending upload, which would break all streaming uses...
    • fill out test cases from Amazon's examples at docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html
    • manual and automated testing against the real S3 to see how much of JetS3t continues working with the new signature version.

    → <<cset 453c0fcd65d5>>

  3. James Murty repo owner

    Man, they have made request signing super-complicated. I'm wading through the morass a little at a time, but I'm not optimistic I will be able to get version 4 signing working for all circumstances without some deep refactoring of JetS3t.

  4. DavidD reporter

    Tests fail here with a RequestTimeTooSkewed The difference between the request time and the current time is too large. response.

    Example date headers sent for a failing request are

    Date: Fri, 31 Oct 2014 18:17:47 GMT
    x-amz-date: 20141031T191747Z
    

    I suspect an issue with Daylight saving time.

  5. James Murty repo owner

    Yes, there seems to be a weird bug/quirk where that header is required or S3 uses the RFC822 format date header in the string to sign, contrary to the documentation, meaning the signature will never be correct.

    Either I'm missing something, or this signature request version isn't fully baked, our maybe both...

  6. James Murty repo owner

    AWS version 4 signing now works with payloads if SHA256 pre-set, re #183

    Requests with a payload are now passed through the signature signing process correctly, provided the SHA256 hash value is pre-set on the object as the "x-amz-content-sha256" header.

    The S3Object helper constructors that build an object from a string or a file automatically calculate and set this value, so objects constructed the default way will just work.

    Also unit-tested URI sigining for non-latin characters, which showed some URI encoding problems which are now fixed.

    → <<cset d8243969cc1c>>

  7. James Murty repo owner

    AWS region and SHA256 hash now set automatically in many cases, re #183

    • AWS region used for AWS version 4 sigining is now automatically derived from the request Host endpoint
    • SHA256 header value is set automatically for many use-cases if it isn't already provided by the object being acted on
    • Unit tests now include a fairly involved workflow involving creating a bucket in a non-US location to verify signing of alternate regions.

    → <<cset 9b6904e91266>>

  8. James Murty repo owner

    @DavidD regarding the RequestTimeTooSkewed errors, which of the date values is correct. They should both be the same obviously – GMT/UTC times – but I assume it's the x-amz-date header which is off by an hour?

  9. James Murty repo owner

    Potential fix of incorrect timestamp for AWS version 4 signing, re #183

    Ensure AWS-flavoured ISO8601 format timestamp parser/formatter is set to the GMT timezone.

    Also made parsing/formatting of these timestamps threadsafe.

    → <<cset 7ed75936a307>>

  10. James Murty repo owner

    @DavidD If I look through enough JetS3t code I re-find all the strange issues I already fixed in the past – I think my last commit that explicitly sets the GMT timezone for the parser/formatter for the signature timestamps should fix that problem.

    Since I'm currently in the GMT/UTC timezone myself I will be more prone to these kinds of bugs.

  11. DavidD reporter

    @James Murty My tests are currently failing because ResetUtils#awsRegionForRequest is returning the default region for a virtual-host named bucket.

    GET https://test.cyberduck.ch.s3.amazonaws.com:443/?max-keys=1000&prefix=empty%2F&delimiter=%2F HTTP/1.1

    Any way to override this?

  12. James Murty repo owner

    @DavidD To be honest I'm not sure how to handle the new requirement that you know a bucket's region in advance before you sign the request. I can't think of a clean way of handling it, or of automating this work in JetS3t so users don't need to worry about it.

    A helper method like RestUtils.awsHostnameForRegion could make it easier to set the appropriate Host endpoint prior to sending requests, but having to set a different Host when the default s3.amazonaws.com endpoint previously worked seems like a nasty hack. Plus you would need an extra lookup request to find out the bucket's region before you even get that far. Can you even look up a bucket's region using virtual-host domains? You might need to do all such lookups using requests to the default region at s3.amazonaws.com with the bucket name in the path.

    An alternative might be to check for errors in request signing due to an incorrect region, and then hack the request's Host to use the right region before sending a retry.

  13. DavidD reporter

    @James Murty I agree with your assessment. This very much looks like a chicken-and-egg problem. I cannot lookup the region of a bucket that is in eu-central-1 (or any future location that uses AWS4 signatures) without being able to sign with AWS4.

  14. James Murty repo owner

    Fix and retry requests that fail due to wrong region in signature, re #183

    This change permits requests that failed due to AWS version 4 signing issues to succeed upon retry in two cases: after request is redirected to appropriate region-specific endpoint by S3 307 response, e.g. for object PUT to bucket in non-default region. when S3 does not supply a redirect but instead returns an "AuthorizationHeaderMalformed" error containing the expected Region as part of the XML error response, in which case the request's Host endpoint is adjusted to point to the appropriate region.

    → <<cset fdce4307a798>>

  15. James Murty repo owner

    @DavidD Although I think it's a bit of an ugly hack, I have tried modifying a request's Host endpoint if S3 returns a region-specific signing error. When combined with another fix to re-authenticate requests that S3 redirects to the appropriate region endpoint, JetS3t should now handle most cases I can think of where the region is not known in advance.

    There may be a lot of extra unnecessary request-then-retry cycles for buckets in regional locations if the service's endpoint cannot be known and set in advance, but I think that's the best that can be done.

    See what you think, and see in particular the updated unit tests that show things working with the new "eu-central-1" region with no fore-knowledge of that region.

  16. DavidD reporter

    @James Murty I see, at least the known region is given in the 400 XML response. This is a great workaround. I stil have issues with HEAD requests failing with 400 and no XML message with the empty string hash.

    Sample request

    HEAD /ff9d15c2-30cf-4182-956b-3424eee50467 HTTP/1.1
    Date: Sun, 02 Nov 2014 07:45:28 GMT
    x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    Host: test.cyberduck.ch.s3.amazonaws.com
    x-amz-date: 20141102T074528Z
    Authorization: AWS4-HMAC-SHA256 Credential=X/20141102/us-east-1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=247ddbd6855a2fda1332356d1f24367460f3977400b081264ece346dd8893d78
    Connection: Keep-Alive
    Accept-Encoding: gzip,deflate
    HTTP/1.1 400 Bad Request
    x-amz-request-id: 176A6F8EA64A8988
    x-amz-id-2: x9sZl/bdj7G7kmrokNY1fdLcvbA7ihq4xuG4wA1gUr0JnK3rzvuDXzsdrGxEla/Y
    Content-Type: application/xml
    Transfer-Encoding: chunked
    Date: Sun, 02 Nov 2014 07:45:28 GMT
    Connection: close
    Server: AmazonS3
    
  17. James Murty repo owner

    Demostrate minimal GET instead of HEAD when region unknown, re #183

    HEAD requests to a target bucket with unknown region is likely to fail with invalid signature when using AWS signature version 4 since you need to know the region in advance. JetS3t cannot work around this problem as it can for GET requests, as the error returned by S3 doesn't include enough information to discover the region name that should be used.

    This test case update shows how to use a minimal GET that retrieves only a single byte (the least possible) as a technique that might be acceptable instead of using unrecoverable HEAD requests.

    → <<cset be0efe041a69>>

  18. James Murty repo owner

    @DavidD The HEAD issue has me beat, I have tried but could not find a way to discover the expected AWS region from the HEAD error message. Although the error response in this case has an XML document content-type, I cannot access it in JetS3t because either S3 doesn't provide an XML document, or HttpClient doesn't make it available. Which makes sense, given it is supposed to be body-less HEAD response after all...

    The only work-around I can think of is to use a GET with zero byte-range instead, which in my testing limits the data returned to a single byte (see the test case update in the commit above)

  19. DavidD reporter

    @James Murty Thanks for your educated work on this. I just compared with the AWS CLI to find out about the issue and they require to always provide the region name with requests.

    Would it be feasible to have a callback to a protected method in RestStorageService very much like isRecoverable403 to allow the client to determine the region and try again?

    Also given the current expensive approach duplicating requests with a modified the Host header based on the first XML error response, would it be possible to cache this region information or allow the caller somehow to provide the region in subsequent requests.

  20. James Murty repo owner

    @DavidD I have added a bucketName-to-region caching mechanisms to JetS3t as you suggested, as a further work-around for the HEAD request issue, and to greatly reduce the number of fix-then-retry cycles JetS3t will need to perform in normal use.

    This all involves much more black-magic and fragile code than I would like, but I can't see a better way to keep fairly JetS3t useable without the (IMO) completely unreasonable requirement that users provide the region name with every request. Plus I really don't want to add an extra optional region parameter to every single operation in the library. It all seems like madness to me.

    Let me know what you think. It would be great if you could also run tests on the caching mechanism and its logic for reverse-engineering a bucket name from a request, as there are bound to be bugs.

  21. James Murty repo owner

    Refactored AWS version 4 signing utilities into separate class, re #183

    This stuff is complicated enough already without lumping it in with all the other utility cruft in RestUtils.

    Also factored SHA256 lookup/calculation code into a utility method to permit easier testing and debugging.

    → <<cset 6acbe6271f3b>>

  22. James Murty repo owner

    Cache bucket-to-region mappings to assist AWS version 4 signing, re #183

    Add a whole lot of region-aware smarts to JetS3t to try and hide the fact the AWS signature version 4 requires much more fore-knowledge of bucket-to-region mappings than is reasonable. All this work is verging much more towards magic than I'm happy with and is sure to lead to strange issues, but the alternative is dire for users.

    RestStorageService now uses the new RegionEndpointCache to cache known mappings from bucket names to regions. These mappings could be inserted manually, and are also added automatically by JetS3t when it is able to learn about bucket-to-region mappings (such as from prior requests to the same bucket using a region-specific Host, or after requests that fail with a clear region-related region where the correct region is supplied by S3).

    This all means that HEAD requests to a bucket in an unknown region can succeed for the first time, provided the bucket has a mapping in RegionEndpointCache.

    → <<cset a3a04a6ee9b5>>

  23. DavidD reporter

    @James Murty I would still like to see a callback from an initial 400 InvalidRequest error response (The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.) that would allow me the caller to switch to AWS4-HMAC-SHA256 only when accessing buckets in eu-central-1. That would allow me to support eu-central-1 without switching the default signature version just yet as a conservative measure in a bugfix release.

  24. James Murty repo owner

    Add automatic switching to AWS version 4 signatures if necessary, re #183

    If a service not configured to use AWS4-HMAC-SHA256 signatures sends a request to an endpoint that requires these signatures -- as indicated by a 400 InvalidRequest error with a message containing "Please use AWS4-HMAC-SHA256" -- JetS3t automatically retries the request using the necessary signing mechanism.

    Also, a service that uses legacy (AWS2) signing by default will check the new bucketName-to-region cache before signing requests and will switch to AWS4-HMAC-SHA256 if a region is mapped for a request's bucket name.

    All of which means that JetS3t should mostly just work for people using buckets in regions that require AWS4-HMAC-SHA256 signatures, like eu-central-1, without the need to change the default signing mechanism used by the service.

    NOTE: It is still preferable to configure the service to use AWS4-HMAC-SHA256 signatures if feasible, since JetS3t's automatic retry mechanism will take more retry request to succeed.

    → <<cset a12688d156bd>>

  25. James Murty repo owner

    @DavidD Thanks for the kudos, and confirmation things are mostly working okay with AWS4-HMAC-SHA256. Regarding handling the 400 InvalidRequest case smoothly, I think I have a reasonable fix in the latest commit that will permit things to work fairly invisibly even if a service is configured to use the legacy signature method by default.

  26. DavidD reporter

    Are there test cases for multipart uploads available? Just wanted to note that mine do not cover the default behavior in the framework as I use custom HTTP entities with my own x-amz-content-sha256 hash computation.

  27. James Murty repo owner

    Yes, test/org/jets3t/service/TestRestS3Service.java has a set of test cases for multipart uploads. They're reasonable, though far from definitive as there's no way I can practically run tests that upload gigabytes, or even dozens of megabytes, over my slow upstream connection.

  28. James Murty repo owner

    Moved and tested complex logic to find request's bucket name, re #183

    The logic to figure out the bucket name for a given service request is very complex and error-prone, since it has to deal with a number of edge cases, including but not limited to:

    To try and manage this complexity and give me a fighting chance of getting this right, the code to determine a request's bucket name is now in the ServiceUtils class where it can be easily accessed for tests, and is heavily unit-tested.

    → <<cset 0d5eca21a905>>

  29. James Murty repo owner

    Use better bucket name lookup when handling request redirects, re #183

    Fix long-standing hack where requests weren't re-authorized after a 307 redirect as there was no good mechanism to tease out the exact bucket name for the legacy signature process when the Host was altered to one of the S3 regional variants, e.g. s3-us-east-1.amazonaws.com

    → <<cset f86b89ae643c>>

  30. Log in to comment