[Federation] Automatic client registration: Add recommendations to prevent stuck clients

Issue #1751 resolved
Vladimir Dzhuvinov created an issue

The automatic client registration will benefit from guidance to prevent stuck clients due to metadata, policy or multiple resolvable metadata. Stuck clients can potentially occur, because the metadata that an OP creates for the client is not communicated back to the client (as opposed to explicit client registration). For example, for metadata parameters for which the OIDC / OAuth standards don’t define specify a default value, or the default value in the spec is not among the allowed when a federation policy is defined and the policy also has no default value, the RP has no good way to find out what value the parameter was given by the OP.

Example scenario of a stuck client due to undefined default value:

OP metadata:

{
 "token_endpoint_auth_methods_supported":["private_key_jwt"],
 "token_endpoint_auth_signing_alg_values_supported":["RS256","RS512","ES256","ES512"],
 ...
}

RP metadata:

{
 "token_endpoint_auth_method":"private_key_jwt"
}

OIDC specifies that token_endpoint_auth_signing_alg when omitted by the RP has no default value and the OP is free to pick one that it supports (perhaps guided by the presence of suitable keys in the RP’s jwks). RPs that use automatic registration have no way to find out what algorithm was picked, so its token request can potentially fail due to an invalid_client error.

Ways to prevent this:

1 - The RP can prevent itself from getting stuck by making sure it sets an algorithm in its RP metadata (which must be supported by the OP):

{
 "token_endpoint_auth_method":"private_key_jwt",
 "token_endpoint_auth_signing_alg":"ES256"
}

2 - The federation operator (trust anchor) can prevent RPs from getting stuck by defining a suitable policy which must specify a default value:

"token_endpoint_auth_signing_alg": {
  "default": "RS256",
  "one_of" : ["RS256","ES256"]
}

Example scenario of a stuck client with two trust anchors / intermediates that have different policies:

This can occur when two federations with different policies come together.

Trust anchor A policy:

"token_endpoint_auth_signing_alg": {
  "default": "RS256",
  "one_of" : ["RS256","RS512"]
}

Trust anchor B policy:

"token_endpoint_auth_signing_alg": {
  "default": "ES256",
  "one_of" : ["ES256","ES512"]
}

Here the RP may get stuck in two different ways:

1 - If the RP sets token_endpoint_auth_signing_alg in its metadata, e.g. to RS256, but the OP picks a trust chain leading to trust anchor B, which requires the ESxxx family of algorithms, the auto registration will fail at the policy check step.

2 - If the RP leaves the token_endpoint_auth_signing_alg undefined (to prevent a policy failure), it will register successfully, but it may still fail at the token endpoint with an invalid_client error if the JWT alg is not the expected one.

The way to prevent this:

The RP must use the optional trust_chain parameter in the request, to let the OP know which trust anchor it prefers. The token_endpoint_auth_signing_alg in its metadata may be left undefined (to pick the default value), or specify an alg explicitly.

Comments (16)

  1. Michael Jones

    We talked about this on the 9-Dec-22 editors' call. We agree with adding these recommendations. Can you create a PR, Vladimir?

  2. Vladimir Dzhuvinov reporter

    Sure, I will. I decided to propose several important no-nonsense policy design guidelines before delving into the auto-client specific stuff.

    Here is my first stab at the general guidelines for writing OIDC policies:

    1. Ensure the policy for OP or RP metadata is consistent and does not contain
       entries that can potentially result in a JSON object that does not represent
       valid metadata.
    
       Example of consistent RP metadata policy:
    
       ```json
       {
         "token_endpoint_auth_method" : {
            "value" : "private_key_jwt"
         },
         "token_endpoint_auth_signing_alg" : {
            "default" : "RS256",
            "one_of" : [ "RS256", "RS384", "RS512" ]
         }
       }
       ```
    
       Example of a conflicting policy for RP metadata where the required
       `private_key_jwt` authentication method at the token endpoint is not
       compatible with the allowed JWS algorithms:
    
       ```json
       {
         "token_endpoint_auth_method" : {
            "value" : "private_key_jwt"
         },
         "token_endpoint_auth_signing_alg" : {
            "default" : "HS256",
            "one_of" : [ "HS256", "HS384", "HS512" ]
         }
       }
       ```
    
    
    
    1. Ensure the OP metadata policy and RP metadata policy align, are compatible
       with one another and do not contain conflicting entries that can potentially
       prevent RPs from registering with OPs.
    
       Examples of compatible OP and RP policies, for the OP:
    
       ```json
       {
         "id_token_signing_alg_values_supported" : {
            "subset_of" : [ "RS256", "RS384", "RS512" ]
         }
       }
       ```
    
       For the RP:
    
       ```json
       {
         "id_token_signed_response_alg" : {
            "default" : "RS256",
            "one_of"  : [ "RS256", "RS384", "RS512" ]
         }
       }
       ```
    
       Examples of incompatible OP and RP policies, for the OP:
    
       ```json
       {
         "id_token_signing_alg_values_supported" : {
            "subset_of" : [ "RS256", "RS384", "RS512" ]
         }
       }
       ```
    
       For the RP:
    
       ```json
       {
         "id_token_signed_response_alg" : {
            "default" : "ES256",
            "one_of"  : [ "ES256", "ES384", "ES512" ]
         }
       }
       ```
    
    1. Ensure all policy entries for OP or RP metadata parameters that have a
       default value specified in OpenID Connect have those values applied with the
       `default` policy operator. This will prevent false policy negatives for 
       operators like `one_of` when the RP metadata omits a parameter implying its 
       default value.
    
       Example for an ID token signing algorithm policy that ensures the default
       `RS256` value specified in OpenID Connect is applied before performing
       other checks:
    
       ```json
       {
         "id_token_signed_response_alg" : {
            "default" : "RS256",
            "one_of"  : [ "RS256", "RS384", "RS512" ]
         }
       }
       ```
    

  3. Giuseppe De Marco

    thank you Vlad, my notes below

    1. the resolve endpoint mitigates the misalignment issues, giving a way to check how a Federation Entity has applied the policy and got the final metadata. It’s not a solution for the problem but gives a method to diagnose the issues.

    2. we can’t have the paramenter trust_chain as mandatory in the request, because the authz request belongs to oidc core and federation should not break oidc core (at least). It extends the request with the optional trust_chain parameter that an OP may or not use in the trust resolution, or as a hint to update the requester' trust chain and metadata.

    3. for a good interop, especially if an entity uses multiple federations, every participant should make its metadata more eloquent as possible. The implicit behaviours or not existent paramenters may costs interop problems.

    4. for a real interop, every TA should make aware the participants on the default policies and algs and parameters that every participant MUST support. as we use RS256 as default alg, every federation should have its defaults and every participant should deal with these.

    will we a section as implementation consideration, under the metadata policy section?

  4. Roland Hedberg

    To follow up on what Guiseppe says:

    One of the reasons that we added the resolve endpoint was to allow an entity to find out what another entity thought about it. Having that facility you can dynamically find out if the entity’s metadata together with the combined policy of a specific trust chain will actually produce metadata that is usable.

    Since things are inherently dynamic whatever you do (with policies and metadata) before entering a federation in order to produce good metadata, will probably not be valid after a period of time. So I think it’s well worth to define a process to periodically check whether, given a specified trust chain, the resulting entity metadata will be something useful or not.

  5. Vladimir Dzhuvinov reporter

    Thank you Giuseppe and Roland for your inputs.

    I have been thinking about this issue, a lot.

    Suppose we have two trust anchors (TAs) / federations, with different security requirements on the client authentication at the token endpoint:

    • A: It’s FAPI like and wants let’s say mTLS bound access tokens, hence clients mTLS authenticated. Enforced by token_endpoint_auth_method policy.
    • B: It’s okay with only private_key_jwt client auth. Enforced by token_endpoint_auth_method policy.

    Then we have an RP (auto client) which is part of both federations, and sometimes wants to call at OPs in federation A, and other times in federation B.

    What kind of RP metadata should it publish for the token_endpoint_auth_method parameter if there is no trust_chain parameter passed?

    • The RP cannot publish metadata with an explicit token_endpoint_auth_method , because this will break the policy validation with either federation A or B.
    • So, this means both federations must publish a default policy for their token_endpoint_auth_method in case there are RPs belonging to other unknown federations. Else the OIDC default token_endpoint_auth_method will kick in, which is client_secret_basic and is guaranteed to break with both A and B :)
    • But then the RP calls at an OP from which it wants its federation A (FAPI) aspect, but which OP happens also be a member of B (RP need not know this).

    My question to you is, can the resolve endpoint really help here and how? Where should this resolve endpoint be hosted and when and how is to be called?

    I want to point out here that for an RP to get this far without getting stuck (policy violation), every TA when designing its token_endpoint_auth_method policy must take into account the possibility that RPs and OPs may participate in other federations. And thus provide a default policy for the token_endpoint_auth_method, not just a one_of. This foresight and care may not be there when a federation operator sits down to formulate its metadata policies.

    The core issue here stems from the fact that automatic clients have their metadata published at a well-known and presumably static endpoint. So, if the RP wants to belong to 2 or more federations, this published metadata must become the least common denominator and this may not always work across all, especially if a federation’s policy is not accommodating towards such multi-anchored RPs (and OPs). To overcome this the RP must resort to some way to “influence” its registration. For an auto registered RP this at present means passing a trust_chain. Or, switching to explicit registration. Or, this is also a possibility - to set up two different client_id URLs and hence well-known federation EC endpoints. Are there other possibilities?

    My objective is to add some simple guidance in the OIDC Federation 1.0 spec how to deal with that, fool-proof and with minimum hassle. Ideally with what is already available in the spec.

  6. Giuseppe De Marco

    What kind of metadata should the RP consider for different federation policies?

    1. The RP should resolve its final metadata to a TA, and check if the final metadata is compliant to its interop specs
    2. if the OP has the resolve endpoint, the RP may check this also to the OP

    The point you raise is very important and should be explained in the implementation considerations of the specs, so I agree with you.

    there are many implementation choices to deal with this scenario, I may assume to enable/disable some internal components for one or more TA (a sort of a mapping in the general settings of the implementation)

    the light of the facts is that an entity that adheres to two federations must certainly support what is defined by both, and entering with a single EC this must cover both cases

    Exactly as you say, it would be better to make the federation policy explicit for all recognizable claims, so as to make the interop requirements transparent

    I don't see any MUST in this, but only good implementation practices to solve problems, so if we agreed this discussion could have the objective of enriching the section relating to implementation considerations

  7. Vladimir Dzhuvinov reporter

    The RP should resolve its final metadata to a TA, and check if the final metadata is compliant to its interop specs

    Done by the RP developer or admin, prior to publishing its EC, right? Should it also be done repeatedly by periodic script, etc?

    if the OP has the resolve endpoint, the RP may check this also to the OP

    Should this be done by RP code before calling an OP, or only after an error (stuck client) is encountered by an RP admin?

    Everything we put down in the spec I want to see how it will work in practice, and who is going to be performing these actions (normative or not) - the code or an admin / developer. I personally am no fan of manual labour and believe a good library or product should automate things as much as possible, to prevent mistakes and people having to call support. I have sympathy for Dr. Fett’s occasional rants that OIDC / OAuth lib maintainers are not doing enough to simplify things for app developers :)

    I meant this to be a “consideration”, not a MUST / MUST NOT, just as you say.

  8. Michael Jones

    As discussed on the 23-Mar-23 working group call, this appears to be ready for a pull request. Some of the text in the discussion above can be used in the PR.

  9. Giuseppe De Marco

    Done by the RP developer or admin, prior to publishing its EC, right? Should it also be done repeatedly by periodic script, etc?

    not entirely true, since a single RP with a single EC can be part of multiple federations at the same time, where different policies would be applied.
    A deployer should verify its final metadata against all its trust anchors. It is certainly good practice to check/renew the trust chain relating to your entities on a daily basis and derive the delta, the differences betwen the last valid (old) and the new one. In an ordinary everyday situation only exp should result as changed.

    Should this be done by RP code before calling an OP, or only after an error (stuck client) is encountered by an RP admin?

    The resolve endpoint might have different purposes and scopes, I still consider it more a diagnostic tool then a trust evaluation tool.
    The problem that it solves is the diagnostic of the implementations, different from yours, that processes the trust chain related to your entities and that may have implementation errors in deriving the final metadata.

    If an OP has a faulty implementation and produces a final metadata different from the one you can evaluating regarding your instances, the resolve endpoint give to us this evidence. This kind of automation reduces the costs of the end-to-end communications between different organization to investigate about a fault or a metadata misalignment.

    Yes, I fully agree with Vlad and Daniel :-)

  10. Log in to comment