Key rotation should require a delay between publishing a key and starting to use it?

Issue #1161 open
Joseph Heenan created an issue

https://openid.net/specs/openid-connect-core-1_0.html#RotateSigKeys says:

Rotation of signing keys can be accomplished with the following approach. The signer publishes its keys in a JWK Set at its jwks_uri location and includes the kid of the signing key in the JOSE Header of each message to indicate to the verifier which key is to be used to validate the signature. Keys can be rolled over by periodically adding new keys to the JWK Set at the jwks_urilocation. The signer can begin using a new key at its discretion and signals the change to the verifier using the kid value. The verifier knows to go back to the jwks_uri location to re-retrieve the keys when it sees an unfamiliar kid value. The JWK Set document at the jwks_uri SHOULD retain recently decommissioned signing keys for a reasonable period of time to facilitate a smooth transition.

The “signer can begin using a new key at its discretion” seems potentially problematic - discussion within the certification (around a test intended to test RPs rotating keys, see https://www.heenan.me.uk/~joseph/oidcc_test_desc-phase1.html#OP_Rotation_RP_Sig ) revealed that OPs in larger distributed deployments will in some cases not react immediately to keys being added and a new kid being found. For example to prevent a DoS attack an OP may well decide not to refetch a JWKS it has fetched in the last 60 seconds.

I would suggest tweaking the text so that “The signer can begin using a new key at its discretion” becomes something like “The signer should wait at least a few minutes after it publishes the new key and then can begin using a new key at its discretion”

Comments (8)

  1. Tom Jones

    I had some comments on the FAPI list, but also on the federation spec i think the following need to be fixed in 8.2. Key Rollover for a Trust Anchor

    “After a reasonable time period remove the old keys. What is regarded as a reasonable time is dependent on the security profile and risk assessment of the trust anchor.”

    The problem is the length of time that a key needs to be maintained. For signing - validation must continue to work for as long as the claim signed is valid for use. Encryption for as long as there exist any data that might need to be decrypted. The essential problem is that the key structures of openID are not sufficient to handle the needs of new docs like FAPI or federation. I have added nbf and exp to the keys in my implementation. There is a reason that they are in PKI certs.

  2. Brian Campbell

    In the interest of full disclosure, as the naive fool who originally wrote those potentially problematic words about key rotation[1], I can sometimes get a little defense to criticism of them. But do keep in mind that Connect Core is a final published specification and errata changes can't/shouldn't break existing deployments or functionality. Also, as far as I know, this stuff has been working okay for many years.

    For normal behavior the "signer can begin using a new key at its discretion and signals the change to the verifier using the kid value" works okay. But yes there is an opportunity for abuse by a malicious actor sending lots of unknown kid values. The verifying party not refetching the jwks_uri content more than once per some smallish time period, as Filip described, is a reasonable means of guarding against something like that. But a legitimate key roll to a newly published key during an 'attack' like that with those protections in place might result in some service disruption with erroneous invalid signatures until the new key at the jwks_uri is successfully retrieved. If the signer also waits a bit after publishing the new key before using it, that service disruption would presumably be avoided because (as long as the throttling period and the wait period matched up okay) the verifying party would have gotten the new key at some point previously.

    I think that mandating that stuff is probably too far for an errata 6 or 7 years later. But maybe some text with suggestions along those lines and descriptions of why would be useful and reasonable for an errata?

    [1] https://bitbucket.org/openid/connect/commits/f2d3a27ded548185a2b0cdbee1d126b119853e3e#Lopenid-connect-messages-1_0.xmlT2556

  3. Joseph Heenan reporter

    Thanks Brian.

    It probably wasn’t clear from my original message, but it turned out that several OPs fail the current RP key rotation certification test, so the new java version adds a 60 seconds delay between registering a client and rotating the RP key. 60 seconds is enough for those two OPs to pass the test. [The failing OPs believed they were compliant/certifiable because they falsely believed the RP key rotation was optional - it’s not, it’s the OP key rotation test that’s optional.]

    I felt uncomfortable adding a delay to the test when to me the spec clearly says the signer can start using the new key immediately. I think the absolute minimum I was looking here was endorsement that the certification test having a delay in it was okay, but I absolutely endorse an errata as you suggest.

  4. Filip Skokan

    The failing OPs believed they were compliant/certifiable because they falsely believed the RP key rotation was optional - it’s not, it’s the OP key rotation test that’s optional.

    IIRC the test was still passed in their (or at least my) submission.

  5. Joseph Heenan reporter

    Oh yes, the full situation was more involved than explained in my previous comment. So as I understand it one OP (yours) the test result was “pass” but actually failed the test (an invalid_client error was returned) - a bug in the test that was fixed sometime since 2017, another OP (Authlete) the test in the certification submission I believe showed as failed but the certification had been submitted & published anyway. (I’ve not done a thorough analysis of other submissions.) I believe the python conformance suite CI erroneously skips the RP key rotation test (I’ve not actually checked, but I’m surmising so as I can’t imagine how else this problem could have failed to surface before.)

  6. Michael Jones

    We agreed on the 7-May-20 call that we’re not going to make a normative spec change. We could add non-normative suggestions in errata if there’s consensus on the specific wording.

  7. Log in to comment