async_exec over ssl connection can fail on ubuntu due to weird interaction with openssl

Issue #142 invalid
Caio Chassot
created an issue

Where cert is an instance of OpenSSL::X509::Certificate, and conn an instance of PGconn connected with sslmode:'require', calling conn.async_exec after a call to cert.verify(key) will break with an SSL Error from PG, but only if the verify call returned false.

  • Everything works fine with sslmode=disabled.
  • If verify returns true everything works.
  • Calling exec instead of async_exec also works fine. (But sequel uses async_exec internally.)
  • So far this was only reproducible on Ubuntu, with both 10.04 and 12.04; works fine on OS X. (Update: also repro'd on CentOS release 6.3.)
  • The server version/OS doesn't seem to matter much, as long as SSL is required it'll fail. Tested with many Heroku Postgres instances of different plans, and a local OS X ssl-enabled instance.
  • The specific SSL error PG gives me seem to vary a lot depending on successful previous queries and/or other openssl operations. I didn't try to isolate what causes the message to vary.
  • Varying versions of libpq and openssl made no difference; tested with both whatever is packaged on the ubuntu relases and manually compiled latest stable source releases; also changing a few compilation options for openssl (-O0, no-asm, no-threads) also made no difference.
  • The sample code uses a couple certificates to test verify. The certificates used make no difference, other than you need to be able to get both true and false from verify to observe the effects of each result.

Some sample code here: (There's a comment at the bottom with the output I get when running on ubuntu.)

This sample code will always fail with the following eror:

{{{ PG::Error: SSL error: block type is not 01 from (irb):60:in `async_exec' }}}

Here are a few other messages I got earlier before reducing this to a test script:

{{{ SSL error: wrong signature length: SELECT * FROM… SSL error: EVP lib: SELECT * FROM… SSL error: data too large for modulus: SELECT * FROM… SSL error: data too large for modulus: SELECT * FROM… }}}

The varying messages lead me to believe the particular message is unimportant, but that there's some state/data corruption going on.

Comments (39)

  1. Michael Granger repo owner

    I moved your (excellently formatted, and awesome) report to the ticket body myself.

    I do Github, at least as a mirror, but until it supports Mercurial for version control, it's not an option for the main project site.

    Anyway, thanks for the excellent report. I'll take a look this weekend.

  2. Lars Kanis

    After a longer debugging session, I was able to track down the cause for this wired behavior. My test environment is:

    • PostgreSQL 9.1.4
    • Ruby 1.9.3p125
    • Ubuntu 12.04

    The OpenSSL library maintains a thread local error queue that is filled on any error that occurs within the library. Ruby's openssl extension and libpq both use the same error queue, when running in the same thread. Most errors from in the openssl extension are read out and sent as an Exception, but this is not true for Certificate#verify. It only returns true or false and leaves OpenSSL's error queue untouched.

    On the other hand libpq uses SSL_get_error() to retrieve the state of the SSL connection. See [1] and [3]. This call does not inspect the thread error queue in case of blocking behavior. So PG::Connection#exec does not raise an error (at least for the simple test case). When using the async call the error queue becomes inspected and an Exception is raised for the previous verification error.

    Simple solution is to use OpenSSL.errors after each failed certificate verification. Since the comment to OpenSSL.errors [2] says, that this is probably an error in the openssl extension, I'm not sure, whether something should be fixed there.

    Possibly libpq should be changed in a way, that OpenSSL's thread local error queue is cleared, before every SSL operation, as described in [1].

    I don't think that anything is wrong in ruby-pg.

    What do you think?




  3. Caio Chassot reporter

    Thanks for taking a look.

    Matt Zimmermann has looked into this on our end too and arrived at the same conclusion, albeit his suggestion would be to fix this in ruby's openssl library.

    I think as far as ruby-pg is concerned you can close this. Let me know if you'd like to be kept in the loop as we file reports with ruby and/or postgres.

  4. Caio Chassot reporter

    To be clear, we haven't filed any reports yet.

    It seems you understand the issue in more detail than we do, and could do a more thorough report. If you could please go ahead and file the issues, that would be great.

    I'll then ask Zimmermann to comment on them if he has anything to add, and if need be, talk to our postgres and ruby-core contacts to make sure it gets some attention.

  5. Lars Kanis

    You sad that Matt Zimmermann did a suggestion to fix this in ruby's openssl library. Did he talk about how it should be fixed? I currently see these variants:

    1. Return the OpenSSL error list in Certificate#verify instead of true/false - This will change the API in an incompatible way, so it will be no real option.
    2. Drop the error list at the end of Certificate#verify - So there will be no way to get the particular error text. Maybe add another method in the way as 1.
    3. Add a note in the documentation that suggest the user should call OpenSSL.errors after a failed call to Certificate#verify.

    I don't really like any of these, right now.

  6. Lars Kanis

    OK, thank you! I don't know, if you tested the Matt's patch, but it should have no effect to your sample, since that code path isn't used there. Nevertheless the change to libpq would be similar - to add ERR_clear_error() in fe-secure.c.

  7. Michael Granger repo owner

    I agree with Lars's assessment. The documentation for SSL_get_error(3) is pretty unambiguous about the need to clear the error queue first, so it shouldn't be a hard case to make to the PostgreSQL guys.

  8. Matt Zimmerman

    @Lars Kanis my preference is to always check the return code of the call to find out if there was an error, and then use SSL_get_error to get more detail iff the call failed. Relying on the error queue to determine success or failure seems more complex to me, and creates the possibility of bugs like this one (which are impossible if the return code is checked).

  9. Michael Granger repo owner

    While I generally agree that checking the return code is better, clearing the queue is less change for all of the cases in question, and I think the heuristic of changing as little logic as possible when submitting patches for other peoples' code trumps the error-check. Just my 2 cents.

  10. Eric Woodruff

    larskanis, is "Add a note in the documentation that suggest the user should call OpenSSL.errors after a failed call to Certificate#verify" still valid? Just to be sure, it is not OpenSSL.errors.clear or something?

  11. Lars Kanis

    Unfortunately neither the patch for PostgreSQL was accepted, nor the issue was addressed in Rubys OpenSSL binding. So yes, you should always call OpenSSL.errors after a failed call to Certificate#verify.

  12. kritik

    We are getting the same issue. ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-linux] ruby-pg is the latest on. We are getting this issue in particular situation. All requests are working but at this place.

    What should I do to test?

  13. kritik

    the same with ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux] And what I have found it's present only in one method (I cannot read from database). Everything works in previus method.

    UPD: Ok after deep testing I've found that async_exec (error: #<PG::ConnectionBad: PQconsumeInput() SSL error: wrong signature length>) is not working here, however exec works ok

  14. Lars Kanis

    @kritik This issue is still open and still not fixed in ruby-openssl:

    This is not an issue in ruby-pg, so there is no reason to reopen this ticket. async_exec() is effected by this issue and exec() not, but this is only an implementation detail.

    Without a change in ruby-openssl, you should generally call OpenSSL.errors after each Certificate#verify. This is also wise in order to retrieve the reason, why the certificate isn't valid.

  15. kritik

    @Lars Kanis ok, it's just weird that problem perists in one particular place and when request comes from controller in rails. From command line it's ok :) Anyway will try to push people on ruby-lang

  16. darkcloud

    We were invoking Dnsruby::Dnssec.validate within Rails and encountering this issue. Invoking OpenSSL.errors when #validate raises exceptions solved it for us. Thank-you everyone for the deep debugging, elegant explanation and of course the workaround.

  17. Log in to comment