Attachments with evidence documents

Issue #1209 resolved
Kai Lehmann created an issue

Account providers often outsource the process of identifying the end-user to third party ident service providers. I was wondering if this protocol could be used for transfering the identities form the ident service providers to the account providers. Part of those transfered datasets is usually also a set of files (video recordings of the video ident process, screenshots of the id document, signed documents of the person performing the in person identification, documents certifying a companies entry in a public legal register, …). Those files are usually sent to the account provider (client in this case) along with the end user’s personal data. The documents need to be signed as well (e.g. by including the SHA-256 hash of each document in the JWT which in turn is signed.

It might also make sense to provide those attachments as evidence to other parties downstream, when requested. We might want to think of an extension to the protocol allowing for attaching those evidence documents.

Comments (28)

  1. Mark Haine

    Just combining duplicate tickets #1211. Below is the description I wrote in the duplicate issue…

    A fairly common requirement for eKYC use cases is that a photograph or scan of an identity document and/or a photograph of the individual being verified is returned as part of the evidence set.

    At present it appears that all elements of the eKYC response are assumed to be contained in a JSON object or JWT (including aggregated and distributed claims). Due to the size of bitmap images this could result in very large JSON objects being returned that contain base64 encoded scans or photographs.

    Is there merit in having the option to return a distributed ‘style’ claim that refers to the location of a bitmap image in one of the common binary image file formats?

  2. Kai Lehmann reporter

    As discussed here the examples of APIs I know of which are kind of proprietary, but I try to replicate them as much as I can. For readability I omitted a lot of usually obligatory headers (Authorization, Content-Length, etc.).

    multipart/related

    The first API uses a multipart/related form post - so the data is actually pushed to the receiver instead of the receiver pulling the data from the provider. The first part of this multipart is either an XML or signed JWTs containing the end user’s personal data and a list of references to the documents/images/videos attached in the multipart/related post. In case of XML, XMLDSig is used to sign the XML including the hashes of the attachments. I’ll give the example for the XML variant. The JSON variant has a similar structure:

    POST /verifications/:orderId
    Content-Type: Multipart/Related; boundary=abcdef1234
    
    --abcdef1234
    Content-Type: application/xml
    <?xml ... ?>
    <verified>
      <orderId>xxxx</orderId>
      <user><firstName>...</firstName>...</user>
      <attachments>
        <attachment>
          <id>94b31bf8-52c3-4edb-9d11-a34ba68973aa<id>
          <digest method="sha1">a9993e364706816aba3e25717850c26c9cd0d89d</digest>
        </attachment>
      </attachments>
      <ds:Signature>...</ds:Signature>
    </verified>
    --abcdef1234
    Content-Type: image/jpg
    Content-ID: 94b31bf8-52c3-4edb-9d11-a34ba68973aa
    Content-Disposition: attachment; filename="passport.jpg"
    
    <binaryblob>
    --abcdef1234--
    

    The next example uses a pull interface to retrieve the signed XML/JWT:

    Request:
    
    GET /verified/:orderId
    
    Response:
    
    Content-Type: application/xml
    
    <?xml ... ?>
    <verified>
      <orderId>xxxx</orderId>
      <user><firstName>...</firstName>...</user>
      <attachments>
        <attachment>
          <id>94b31bf8-52c3-4edb-9d11-a34ba68973aa<id>
          <digest method="sha1">a9993e364706816aba3e25717850c26c9cd0d89d</digest>
        </attachment>
      </attachments>
      <ds:Signature>...</ds:Signature>
    </verified>
    
    Request to retrieve attachment:
    
    GET /verified/:orderId/attachments/:attachmentId
    
    Response:
    
    Content-Type: image/jpg
    Content-Disposition: attachment; filename="passport.jpg"
    
    <binaryblob>
    

  3. Mark Haine

    There is the option to have the document images encoded with base64 (I truncated the base64 encoding). This is based on an interface provided as part of a vendor solution that I am working with in a project.

    Unfortunately this could result in some very large JSON payloads

    {
       "verified_claims":{
          "verification":{
             "trust_framework":"jp_eKYC",
             "time":"2012-04-23T18:25Z",
             "verification_process":"f24c6f-6d3f-4ec5-973e-b0d8506f3bc7",
             "evidence":[
                {
                   "type":"id_document",
                   "method":"pipp",
                   "time": "2012-04-22T11:30Z",
                   "document":{
                      "type":"idcard",
                      "issuer":{
                         "name":"Japan Post",
                         "country":"JP"
                      },
                      "documentFrontImage":"/9j/4AAQS**TRUNCATED**oqJ//9k=",
                      "documentBackImage":"/9j/4AAQS**TRUNCATED**xb1P/2Q==",
                      "number":"53554554",
                      "date_of_expiry":"2020-03-22"
                   }
                }
             ]
          },
          "claims":{
             "given_name":"Yuma",
             "family_name":"Masumoto"
          }
       }
    }
    

  4. Julian White

    I couldn’t paste in an exact example, but a rough version of it looks a bit like this:

    <TextData>
        <BinaryAttachemnt>
            <BinaryID>1238838492834710934013<BinaryID>
            <BinaryHash>asdHWUchakenfpaneunvlekjflknalknfliajslknfvpigfnv23najcwk4jbflas</BinaryHash>
        </BinaryAttachemnt>
    <TextData>
    <TextDataSignature>3194hnfcoq834hfoq873h48hoq384yjo9q8od939</TextDataSignature>
    
    <BinaryData>
        <BinaryID>1238838492834710934013<BinaryID>
        <BinaryType>PDF</BinaryType>
        <Binary>/4moxUnf8AE1GKZQopF6/lSikXr+VAD5OtPWmSdaetS9xCP0/KmHtT36flTD2qhIF60LQvWhamRQknSmU+TpTKnoAq06mrTqpbALJQen4USUHp+FMBp/rQn+f1oP8AWhP8/rUdQHf4UL3o/wAKF71LAig6D/Pc1bj/AMKqQdB/nuatx/4UAVrj/WL9H/mtWk7fWqtx/rF+j/zWrSdvrTAifv8AWpD2qN+/1qQ9qrsJkM/WkpZ+tJSe4yRup/z6UkfUfh/Wlbqf8+lJH1H4f1pAOuOn4GuNvfvt9a7K46fga429++31rmq7oqJ//9k=</Binary>
    </BinaryData>
    

    The signed part of the message contains a reference to a binary object and its hash. The object is then appended to the message or sent as a separate message as base64 encoded data.

  5. Torsten Lodderstedt

    Idea discussed and rejected: add a link/id to the JSON object and provide the image as part of the UserInfo response in a multipart response.

    Argument against: OpenID OP would need to be able to handle multi part messages for User Info (and potentially Token Endpoint).

  6. Kai Lehmann reporter

    I was working on several options on how to include attachments. We discussed this already a bit and found that the evidence section would fit best. Here are a few alternatives, how this could look like:

    Option A: Have an evidence type “attachments” and list the attachments inside:

    {
      "verified_claims": {
        "verification": {
          "trust_framework": "de_aml",
          "time": "2012-04-23T18:25Z",
          "verification_process": "f24c6f-6d3f-4ec5-973e-b0d8506f3bc7",
          "evidence": [
            {
              "type": "id_document",
              "method": "pipp",
              "time": "2012-04-22T11:30Z",
              "document": {
                "type": "idcard",
                "issuer": {
                  "name": "Stadt Augsburg",
                  "country": "DE"
                },
                "number": "53554554",
                "date_of_issuance": "2010-03-23",
                "date_of_expiry": "2020-03-22"
              }
            },
            {
              "type": "attachments",
              "attachments": [
                /* attachments go here
              ]
            }
          ]
        },
        /* ... */
      }
    }
    

    Option B: Have an element inside of evidence for each attachment:

    {
      "verified_claims": {
        "verification": {
          "trust_framework": "de_aml",
          "time": "2012-04-23T18:25Z",
          "verification_process": "f24c6f-6d3f-4ec5-973e-b0d8506f3bc7",
          "evidence": [
            {
              "type": "id_document",
              "method": "pipp",
              "time": "2012-04-22T11:30Z",
              "document": {
                "type": "idcard",
                "issuer": {
                  "name": "Stadt Augsburg",
                  "country": "DE"
                },
                "number": "53554554",
                "date_of_issuance": "2010-03-23",
                "date_of_expiry": "2020-03-22"
              }
            },
            {
              "type": "attachment",
              /* attachment data/info goes here */
            },
            {
              "type": "attachment",
              /* 2nd attachment data/info goes here */
            }
          ]
        },
        /* ... */
      }
    }
    

    Another option I am considering is to have an attachments section underneath verification itself like so:

    {
      "verified_claims": {
        "verification": {
          "trust_framework": "de_aml",
          "time": "2012-04-23T18:25Z",
          "verification_process": "f24c6f-6d3f-4ec5-973e-b0d8506f3bc7",
          "evidence": [
            {
              "type": "id_document",
              "method": "pipp",
              "time": "2012-04-22T11:30Z",
              "document": {
                "type": "idcard",
                "issuer": {
                  "name": "Stadt Augsburg",
                  "country": "DE"
                },
                "number": "53554554",
                "date_of_issuance": "2010-03-23",
                "date_of_expiry": "2020-03-22"
              }
            }
          ],
          "attachments": [
            /* data of attachments goes here */
          ]
        },
        /* ... */
      }
    }
    

    In all the shown examples, the attachments are somewhat decoupled from the actual evidence type (in case of the examples above the “document” type). This allows returning attachments without specifying another evidence element. The specification of the trust framework is enough in those cases anyway. If we want a tight coupling between evidence types and their attachments, this could be made a sub section of the respective evidence type:

    {
      "verified_claims": {
        "verification": {
          "trust_framework": "de_aml",
          "time": "2012-04-23T18:25Z",
          "verification_process": "f24c6f-6d3f-4ec5-973e-b0d8506f3bc7",
          "evidence": [
            {
              "type": "id_document",
              "method": "pipp",
              "time": "2012-04-22T11:30Z",
              "document": {
                "type": "idcard",
                "issuer": {
                  "name": "Stadt Augsburg",
                  "country": "DE"
                },
                "number": "53554554",
                "date_of_issuance": "2010-03-23",
                "date_of_expiry": "2020-03-22"
              },
              "attachments": [
                /* data of attachments goes here */
              ]
            }
          ]
        },
        /* ... */
      }
    }
    

    However, I consider attachments as artifacts of the verification process/method - not the documents. In some cases it could be a PDF document of a filled verification form with a signature of the person doing the verification.

    Due to this reasoning (attachments being artifacts of the verification process) I personally favor the attachments section underneath “verification” itself.

    @Torsten Lodderstedt Before I go ahead with writing the spec, I would like your opinion on that.

  7. Torsten Lodderstedt

    One use case we should support is the transfer of a copy of the id document used in a verification. Several regulations, including AML and telecommunications, require a verifier/IDP to provide this to the RP. In this case the attachment should be part if the id_document evidence element since it is related to a particular document.

    That might be the case for utility_bill as well.

    Attachments might be useful in other contexts as well (like you mentioned).

    I suggest to define an abstract attachment element type and add it to the structure as needs.

  8. Kai Lehmann reporter

    The id document may not be just one attachment, but two (image of front and back respectively). So even for attaching this to a specific evidence element requires a subelement of “attachments” with one ore more attachment sub elements.

    At least for our processes where we need this a generic list of attachments is usually enough as during audits all attachments are being looked at and verified. I understand that there is the need to specifically identify the attachments containing images of the id document for the evidence. On the other hand I don’t like having the attachments being scattered around in the JSON object structure. I prefer to have a clear understanding for OPs where to place those attachments and for RPs where to expect them.

    So how about having references to particular attachments for specific evidence types like so:

    {
      "verified_claims": {
        "verification": {
          "trust_framework": "de_aml",
          "time": "2012-04-23T18:25Z",
          "verification_process": "f24c6f-6d3f-4ec5-973e-b0d8506f3bc7",
          "evidence": [
            {
              "type": "id_document",
              "method": "pipp",
              "time": "2012-04-22T11:30Z",
              "document": {
                "type": "idcard",
                "issuer": {
                  "name": "Stadt Augsburg",
                  "country": "DE"
                },
                "number": "53554554",
                "date_of_issuance": "2010-03-23",
                "date_of_expiry": "2020-03-22",
                "attachments": ["46de4e39-4d41-4bde-8073-4a77d4083c89", "89b0473f-2bd5-4d2c-8ff6-3c5eae40681e"]
              }
            }
          ],
          "attachments": [
            {
              "id": "46de4e39-4d41-4bde-8073-4a77d4083c89",
              "desc": "Front of id document",
              "content_type": "image/png",
              "content": "iVBORw0KGgoAAAANSUhEUgAAAAIAAAACCAYAAABytg0k...AErkJggg=="
            },
            {
              "id": "89b0473f-2bd5-4d2c-8ff6-3c5eae40681e",
              "desc": "Back of id document",
              "content_type": "image/png",
              "content": "a2pzZGZoamsgd2pybHdlanJ3ZWwga3Jqd2wgcndqw7Z3...3cbzN+Cg=="
            },
            { /* ... */ }
          ]
        },
        /* ... */
      }
    }
    

    In regards to the attachment element itself, I propose the following two options:

    For embedded binary data:

    • content_type: Content type of the attachment
    • content: Base 64 encoded binary content of the attachment
    • desc: (optional) Short description of the attachment, could be file name
    • id: (optional) unique identifier of the attachment for referencing purposes if needed

    For external binary data:

    • digest: base 64 encoded hash value of the binary data
    • endpoint: URL for retrieving the attachment
    • access_token: (optional) access token to be used for retrieving the attachment. If omitted the access token is to be used which was issued by the token endpoint
    • expires_in: (optional) number of seconds until access_token expires
    • desc: (optional) Short description of the attachment, could be file name
    • id: (optional) unique identifier of the attachment for referencing purposes if needed

    The algorithm for the digest will be announced in the OIDC discovery endpoint or selected via OIDC client registration & management interface.

  9. Torsten Lodderstedt

    On the other hand I don’t like having the attachments being scattered around in the JSON object structure.

    Can you please explain why? In my opinion, the attachment construct is an element type that can be used where needed, just like the time element type.

    I suggest to define schema types for the two representations and then refer to this definitions in respective element definitions in the verified_claims scheme.

    So how about having references to particular attachments for specific evidence types like so:

    The spec so far always maps relationships by containment. I don’t see a reason to deviate from this principle and therefore prefer to have the ability to add attachments where needed. I think we have discovered two locations, the verification element and the evidence element. That does not seem to be overly complicated but gives as enough options to get implementers feedback and start (prototypical) implementations.

  10. Kai Lehmann reporter

    Can you please explain why?

    I using my implementers hat here. It makes things easier when parsing a structure and iterating through all the attachments in order to store them in a database or storage system. If I as an implementer know the attachments to be at a specific location in the structure, I just need to look there. Things getting more complicated when I have to look for them at different places inside the JSON/data structure. It also complicates the models and serialization code for them. In the end all attachments will be stored together (again, just talking about my experience - others' may differ). We do this for our implementations as well. We have identifiers (e.g. primary keys in a database table) for each attachment and associations between them and the process they belong to. Those attachments are mostly looked at when doing audits of the process.

    Not sure if the comparision to a time type fits. A time type is just contained within a string (in JSON) and in a database a time is simply a column in a table.

    One other reason why have the attachments at one place only is when we embedd the binary data in the JSON itself. When looking at this JSON it will be difficult to read by humans as the attachments and base64 binary data is scattered around and disrupts the reading flow. When using an IDE, I could simply collapse the attachments section if it is in one place only, otherwise I have to click some more.

    And the third reason would be that RPs can decide if they want the collected attachments by stating this in the verified_claims request:

    {
      "verification": {
        "attachments": nil
      }
    }
    

    If attachments can be at several locations, the RP needs to specify this at each of those locations. I recommend to also show on the consent page of the OP when attachments are being sent to the RP as the information therein will most probably be more than what the RP has requested with the claims themselves.

    Following the containment principle is of course a valid point. Just saying that it complicates the things I mentioned, but it won’t be a show stopper for implementers. I will continue with defining the schemas and examples.

  11. Torsten Lodderstedt

    re third reason: I assume a certain deployment, scheme or trust framework would clearly define where attachments can turn up for what purpose. So the RP would exactly know where to put the null. BTW: I would assume the attachment element to have a specific name, such as “document_scan”.

    I understand the advantages of having all attachments in one place. To me it feels like introducing multi part messages in JSON. The scattering of binary data won’t happen in cases where the scan, for example, is referenced instead of being included. So your arguments apply for inclusion by value only.

  12. Mark Haine

    I prefer the approach where there is a mapping of “relationships by containment”. I also think that for a given implementation it is highly likely that the locations for binary “attachments” will be known so RP will not have to look for the attachments (especially if they have defined that they wish to receive them)

  13. Kai Lehmann reporter

    @Torsten Lodderstedt I pushed my changes to the schema and added axamples in the branch https://bitbucket.org/openid/ekyc-ida/branch/attachments I included attachments at verification level as well as for evidence of types id_documents and utility_bill. This will require the RP to add the attachments: null to several places if they want all possible attachments. It may of course be implicitally required to transfer attachments by requesting a specific trust framework.

  14. Torsten Lodderstedt

    great! thanks! May I ask you to also create a Pull Request? I can also do this if you like.

  15. Kai Lehmann reporter

    The change is not yet ready for a PR. The specification itself needs to be enhanced to incorporate attachments. Unless of course we want to use the PR for discussions?

  16. Torsten Lodderstedt

    I thought you wanted to start the discussion about your proposal. I think as PR would be the better place than the ticket.

  17. Dima Postnikov

    Any privacy considerations relevant to evidence documents? E.g.: End user might agree to share only some of the elements disclosed on a full document image.

  18. Kai Lehmann reporter

    @Nat Sakimura Thanks for this gem. Will incorporate it.

    @Dima Postnikov Yes, I added a section for this in the PR already. Have a look and let me know if that would be enough in your opinion.

  19. Log in to comment