Let's Encrypt renewals have stopped working all of a sudden

Issue #943 new
Greg created an issue

My certs have expired and can’t be renewed.

The web console shows this output:

[2022-06-04 10:22:46] LEScript.INFO: Getting list of URLs for API
[2022-06-04 10:22:46] LEScript.INFO: Requesting new nonce for client communication
[2022-06-04 10:22:46] LEScript.INFO: Account already registered. Continuing.
[2022-06-04 10:22:46] LEScript.INFO: Sending registration to letsencrypt server
[2022-06-04 10:22:46] LEScript.INFO: Sending signed request to https://acme-v02.api.letsencrypt.org/acme/new-acct
[2022-06-04 10:22:47] LEScript.INFO: Account: https://acme-v02.api.letsencrypt.org/acme/acct/46379670
[2022-06-04 10:22:47] LEScript.INFO: Starting certificate generation process for domains
[2022-06-04 10:22:47] LEScript.INFO: Requesting challenge for mail.ourwebsite.com
[2022-06-04 10:22:47] LEScript.INFO: Sending signed request to https://acme-v02.api.letsencrypt.org/acme/new-order
[2022-06-04 10:22:47] LEScript.INFO: Sending signed request to https://acme-v02.api.letsencrypt.org/acme/authz-v3/114548628036
[2022-06-04 10:22:47] LEScript.INFO: Got challenge token for mail.ourwebsite.com
[2022-06-04 10:22:47] LEScript.INFO: Token for mail.ourwebsite.com saved at /opt/www//.well-known/acme-challenge/_VE1IYi528t_151XgrKNEv5h1GBSE5UsDMzOOLAWrw8 and should be available at http://mail.ourwebsite.com/.well-known/acme-challenge/_VE1IYi528t_151XgrKNEv5h1GBSE5UsDMzOOLAWrw8
[2022-06-04 10:22:47] LEScript.ERROR: Please check http://mail.ourwebsite.com/.well-known/acme-challenge/_VE1IYi528t_151XgrKNEv5h1GBSE5UsDMzOOLAWrw8 - token not available
[2022-06-04 10:22:47] LEScript.ERROR: #0 /opt/admin/src/Base/Handler/LeHandler.php(62): Analogic\ACME\Lescript->signDomains(Array)
[2022-06-04 10:22:47] LEScript.ERROR: #1 /opt/admin/src/Base/Controller/LeController.php(71): App\Base\Handler\LeHandler->renew(true)
[2022-06-04 10:22:47] LEScript.ERROR: #2 /opt/admin/vendor/symfony/http-kernel/HttpKernel.php(158): App\Base\Controller\LeController->issueAction(Object(Symfony\Component\HttpFoundation\Request))
[2022-06-04 10:22:47] LEScript.ERROR: #3 /opt/admin/vendor/symfony/http-kernel/HttpKernel.php(80): Symfony\Component\HttpKernel\HttpKernel->handleRaw(Object(Symfony\Component\HttpFoundation\Request), 1)
[2022-06-04 10:22:47] LEScript.ERROR: #4 /opt/admin/vendor/symfony/http-kernel/Kernel.php(201): Symfony\Component\HttpKernel\HttpKernel->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true)
[2022-06-04 10:22:47] LEScript.ERROR: #5 /opt/admin/public/index.php(25): Symfony\Component\HttpKernel\Kernel->handle(Object(Symfony\Component\HttpFoundation\Request))
[2022-06-04 10:22:47] LEScript.ERROR: #6 {main}

Logging into the container I try to renew manually:

root@mail:/opt/www# poste -vvv le:renew
10:20:45 DEBUG     [app] Email transport "Symfony\Component\Mailer\Transport\Smtp\SmtpTransport" starting                                                                                                                                                     
10:20:45 DEBUG     [app] Email transport "Symfony\Component\Mailer\Transport\Smtp\SmtpTransport" started                                                                                                                                                      
10:20:45 INFO      [php] User Deprecated: Return value of "App\Base\CommandInternal\RenewCommand::execute()" should always be of the type int since Symfony 4.4, NULL returned.                                                                               
[                                                                                                                                                                                                                                                             
  "exception" => ErrorException {                                                                                                                                                                                                                             
    #message: "User Deprecated: Return value of "App\Base\CommandInternal\RenewCommand::execute()" should always be of the type int since Symfony 4.4, NULL returned."                                                                                        
    #code: 0
    #file: "/opt/admin/vendor/symfony/console/Command/Command.php"
    #line: 258
    #severity: E_USER_DEPRECATED
    trace: {
      /opt/admin/vendor/symfony/console/Command/Command.php:258 { …}
      /opt/admin/vendor/symfony/console/Application.php:1027 { …}
      /opt/admin/vendor/symfony/console/Application.php:273 { …}
      /opt/admin/src/Base/CommandInternal/Application.php:80 {
        App\Base\CommandInternal\Application->doRun(InputInterface $input, OutputInterface $output)^
        ›     $this->setDispatcher($dispatcher);
        ›     return parent::doRun($input, $output);
        › }
      }
      /opt/admin/vendor/symfony/console/Application.php:149 { …}
      /opt/admin/bin/mailserver:42 { …}
    }
  }
]

Next I check the challenge directory. Initially it didn’t contain a challenge. But through the web console I disabled and re-enabled LE and got it to appear (however the web console still showed an error renewing). This time the challenge token appeared, but seemingly wasn’t being using:

root@mail:/opt/www# ls -la .well-known/acme-challenge/
total 11
drwxrwxrwx. 2 mail mail  5 Jun  4 10:20 .
drwxr-xr-x. 3 root root  3 Mar  7 01:10 ..
-rw-r--r--. 1 mail mail  0 Mar  7 01:10 .gitkeep
-rw-r--r--. 1 root root 19 Jun  4 10:11 TESTING_TOKEN
-rw-r--r--. 1 mail mail 87 Jun  4 10:21 _VE1IYi528t_151XgrKNEv5h1GBSE5UsDMzOOLAWrw8
root@mail:/opt/www# cat .well-known/acme-challenge/_VE1IYi528t_151XgrKNEv5h1GBSE5UsDMzOOLAWrw8 
_VE1IYi528t_151XgrKNEv5h1GBSE5UsDMzOOLAWrw8.dTS99YYJSgNHWqzlQMhrr[LAST PART REMOVED]root@mail:/opt/www# 

I try downloading it via wget from within the container:

root@mail:/opt/www# wget http://mail.ourwebsite.com/.well-known/acme-challenge/_VE1IYi528t_151XgrKNEv5h1GBSE5UsDMzOOLAWrw8
--2022-06-04 10:28:43--  http://mail.ourwebsite.com/.well-known/acme-challenge/_VE1IYi528t_151XgrKNEv5h1GBSE5UsDMzOOLAWrw8
Resolving mail.ourwebsite.com (mail.ourwebsite.com)... 172.18.0.3, 172.55.0.77
Connecting to mail.ourwebsite.com (mail.ourwebsite.com)|172.18.0.3|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 87 [application/octet-stream]
Saving to: '_VE1IYi528t_151XgrKNEv5h1GBSE5UsDMzOOLAWrw8'

_VE1IYi528t_151XgrKNEv5h1GBSE5UsDMzOOLAWrw8                       0%[                        ]       0  --.-KB/s    in 0s      

2022-06-04 10:28:43 (0.00 B/s) - Connection closed at byte 0. Retrying.

--2022-06-04 10:28:44--  (try: 2)  http://mail.ourwebsite.com/.well-known/acme-challenge/_VE1IYi528t_151XgrKNEv5h1GBSE5UsDMzOOLAWrw8
Connecting to mail.ourwebsite.com (mail.ourwebsite.com)|172.18.0.3|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 87 [application/octet-stream]
Saving to: '_VE1IYi528t_151XgrKNEv5h1GBSE5UsDMzOOLAWrw8'

_VE1IYi528t_151XgrKNEv5h1GBSE5UsDMzOOLAWrw8                       0%[                        ]       0  --.-KB/s    in 0s      

2022-06-04 10:28:44 (0.00 B/s) - Connection closed at byte 0. Retrying.

^C

I had to ^C because wget kept looping for some reason and redownloading it. But it did download an empty file:

root@mail:/opt/www# cat _VE1IYi528t_151XgrKNEv5h1GBSE5UsDMzOOLAWrw8
root@mail:/opt/www# 

I don’t know what else to do except try to painstakingly switch off Poste’s built-in nginx server to an external one where I manually manage the LE cert. Everything used to work so well and now for no reason it stopped.

During this I did upgrade the container from 2.3.4 to 2.3.7 but that didn’t fix anything. (EDIT: I have since upgraded to 2.3.8 but the same problem persists.)

Please help!

Comments (9)

  1. Stephan Krebernik

    As a quick & dirty fix you can add your mail domains to the containers hosts file. that solved the problem for me.

  2. Greg reporter

    Hmm, I’ve already switched off of the built-in Let’s Encrypt and onto an external one, manually copying in the cert files into /data/ssl, and that works for me.

    It would be great if adding the mail domain to the /etc/hosts file somehow magically fixed this. What do you set as the IP address, the external DNS address? (Maybe next time I have a sysadmin session I’ll give it a try, but am also hoping for an official fix.)

  3. Greg reporter

    Well, that doesn’t seem to have helped.

    [2022-07-06 17:24:16] LEScript.INFO: Getting list of URLs for API
    [2022-07-06 17:24:16] LEScript.INFO: Requesting new nonce for client communication
    [2022-07-06 17:24:16] LEScript.INFO: Account already registered. Continuing.
    [2022-07-06 17:24:16] LEScript.INFO: Sending registration to letsencrypt server
    [2022-07-06 17:24:16] LEScript.INFO: Sending signed request to https://acme-v02.api.letsencrypt.org/acme/new-acct
    [2022-07-06 17:24:16] LEScript.INFO: Account: https://acme-v02.api.letsencrypt.org/acme/acct/46379670
    [2022-07-06 17:24:16] LEScript.INFO: Starting certificate generation process for domains
    [2022-07-06 17:24:16] LEScript.INFO: Requesting challenge for mail.example.com
    [2022-07-06 17:24:16] LEScript.INFO: Sending signed request to https://acme-v02.api.letsencrypt.org/acme/new-order
    [2022-07-06 17:24:17] LEScript.INFO: Sending signed request to https://acme-v02.api.letsencrypt.org/acme/authz-v3/127745045106
    [2022-07-06 17:24:17] LEScript.INFO: Got challenge token for mail.example.com
    [2022-07-06 17:24:17] LEScript.INFO: Token for mail.example.com saved at /opt/www//.well-known/acme-challenge/Fq8Vw-XXXXX-REMOVED-XXXXX and should be available at http://mail.example.com/.well-known/acme-challenge/Fq8Vw-XXXXX-REMOVED-XXXXX
    [2022-07-06 17:24:17] LEScript.ERROR: Please check http://mail.example.com/.well-known/acme-challenge/Fq8Vw-XXXXX-REMOVED-XXXXX - token not available
    [2022-07-06 17:24:17] LEScript.ERROR: #0 /opt/admin/src/Base/Handler/LeHandler.php(62): Analogic\ACME\Lescript->signDomains(Array)
    [2022-07-06 17:24:17] LEScript.ERROR: #1 /opt/admin/src/Base/Controller/LeController.php(71): App\Base\Handler\LeHandler->renew(true)
    [2022-07-06 17:24:17] LEScript.ERROR: #2 /opt/admin/vendor/symfony/http-kernel/HttpKernel.php(158): App\Base\Controller\LeController->issueAction(Object(Symfony\Component\HttpFoundation\Request))
    [2022-07-06 17:24:17] LEScript.ERROR: #3 /opt/admin/vendor/symfony/http-kernel/HttpKernel.php(80): Symfony\Component\HttpKernel\HttpKernel->handleRaw(Object(Symfony\Component\HttpFoundation\Request), 1)
    [2022-07-06 17:24:17] LEScript.ERROR: #4 /opt/admin/vendor/symfony/http-kernel/Kernel.php(201): Symfony\Component\HttpKernel\HttpKernel->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true)
    [2022-07-06 17:24:17] LEScript.ERROR: #5 /opt/admin/public/index.php(25): Symfony\Component\HttpKernel\Kernel->handle(Object(Symfony\Component\HttpFoundation\Request))
    [2022-07-06 17:24:17] LEScript.ERROR: #6 {main}
    

    FWIW, there was an existing line for mail.domain.tld there that pointed to the Docker IP address, and I replaced that with 127.0.0.1 and then attempted the renewal. Maybe I need to do these steps in a specific order? (Before the server boots up?)

  4. Greg reporter

    I’m having a new problem that I think is related to this one.

    When I try to curl the token from the terminal inside the container I get:

    root@mail:/# curl http://mail.domain.told/.well-known/acme-challenge/Fq8Vw-XXXX*
    curl: (18) transfer closed with 87 bytes remaining to read
    

    I am now getting this same error when trying to fetch any static file from the server when simply running it on a different port (e.g. HTTP_PORT=8221) so that I can run my own nginx proxy in front of it.

    root@mail:/# curl http://localhost:8221/admin/resources/jquery-2.1.3.min.js
    curl: (18) transfer closed with 84320 bytes remaining to read
    

    So I now have two problems related to this bug:

    1. Either I run poste directly, in which case I can’t renew the HTTPS cert because of this error
    2. Or I run it on a different port, with a proxy in front of it, in which case I can’t load webmail

    This is maddening. HELP!!

  5. Greg reporter

    OK, so I think I might have fixed my nginx proxy setup. (EDIT: nope. See next comment.) The direct-poste Let’s Encrypt setup is still suffering from the problem above (since I can’t edit Poste’s nginx configuration), but when following this suggestion and adding the following lines to my nginx proxy file, I can now load webmail again!

    add_header X-Accel-Buffering no;
    proxy_buffering off;
    

    This is what my nginx proxy config now looks like:

    server {
      server_name @DOMAIN@;
      listen 80;
      listen 443 ssl;
      ssl_certificate /etc/letsencrypt/live/@DOMAIN@/fullchain.pem;
      ssl_certificate_key /etc/letsencrypt/live/@DOMAIN@/privkey.pem;
    
      # https://nginx.org/en/docs/http/ngx_http_proxy_module.html
      location / {
        if ($scheme != "https") {
          return 301 https://$host$request_uri;
        }
        proxy_pass http://poste:8221;
        # proxy_pass http://poste:80;
        # this proxy buffering stuff is to fix this: https://bitbucket.org/analogic/mailserver/issues/943/lets-encrypt-renewals-have-stopped-working#comment-63824152
        # tip from: https://github.com/sameersbn/docker-gitlab/issues/226#issuecomment-87966838
        add_header X-Accel-Buffering no;
        proxy_buffering off;
        proxy_set_header   Host             $host;
        proxy_set_header   X-Real-IP        $remote_addr;
        proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
      }
    
      location /.well-known {
        root /var/www/well-known/@DOMAIN@;
        try_files $uri $uri/;
        add_header Cache-Control "private, max-age=0, no-cache";
      }
    }
    

    And for those curious, my newLISP command for copying the cert looks like this (some function definitions not visible here):

    (setf nginx-container (s->c "nginx"))
    
    (define (nginx-cert domain file)
      (format "%s:/etc/letsencrypt/live/%s/%s" nginx-container domain file))
    
    (define (poste-cert file)
      (format {./vol/poste/_override/data/ssl/%s} file))
    
    (define (cmd-cert-copy domain)
      (and (null? domain) (die "<domain> missing!"))
      (unless (directory? "./vol/poste/_override/data/ssl")
        (as-root {mkdir -vp ./vol/poste/_override/data/ssl}))
      (as-root {docker cp -L "%s" "%s"} (nginx-cert domain "chain.pem") (poste-cert "ca.crt"))
      (as-root {docker cp -L "%s" "%s"} (nginx-cert domain "cert.pem") (poste-cert "server.crt"))
      (as-root {docker cp -L "%s" "%s"} (nginx-cert domain "privkey.pem") (poste-cert "server.key"))
      (if (catch (s->c "poste") 'err)
        (docker-compose {restart poste})
        (println (strip-indentation {poste isn't running, you might want to start it
                                     manually with:
    
                                       ./cmd service start poste}))))
    

    The important part is the 3 docker cp -L commands to copy from <nginx-proxy-container>:/etc/letsencrypt/live/<domain>/<file> to ./vol/poste/_override/data/ssl/<file>, and my poste container is setup to copy files and folders from ./vol/poste/_override into /.

  6. Greg reporter

    Well that didn’t last long. The proxy_buffering off; thing with external nginx seemed to work briefly but then most of the resources on the webmail and admin pages stopped loading again.

    To summarize, I’ve discovered two problems (possibly related to each other):

    1. Poste’s built-in Let’s Encrypt has stopped working completely.
    2. Many of the static resources like jquery-2.1.3.min.js on the admin and webmail pages have stopped loading too.

  7. Greg reporter

    Update!

    After migrating to a server that doesn’t use ZFS the problem has gone away, along with other problems.

    I’m not sure if ZFS was the cause of this specific problem, but it might have been.

    Thankfully the built-in Let’s Encrypt renewal appears to be working again!

  8. Log in to comment