HAProxy and Let's encrypt renewal challenge fails

hellfire · 30 May 2024 07:45

First of all, so far the Let’s encrypt certification renewal challenge worked great until today.

Since the renewal done with a cron job on IPFire, I did not notice so far that there is an issue up to now. Today, the various hosted sites behind IPFire are not reachable anymore because of certificate expiration.

Today my SSL certificates finally expired and the renewal process fails with error, domain names and public IP anonymized:

Processing XXX.mydomain.de
 + Checking expire date of existing cert...
 + Valid till May 30 04:00:47 2024 GMT (Less than 30 days). Renewing!
 + Signing domains...
 + Generating private key...
 + Generating signing request...
 + Requesting new certificate order from CA...
 + Received 1 authorizations URLs from the CA
 + Handling authorization for XXX.mydomain.de
 + 1 pending challenge(s)
 + Deploying challenge tokens...
 + Responding to challenge for XXX.mydomain.de authorization...
XXX.mydomain.de failed! + Cleaning challenge tokens...
 + Challenge validation has failed :(
ERROR: Challenge is invalid! (returned: invalid) (result: ["type"]      "http-01"
["status"]      "invalid"
["error","type"]        "urn:ietf:params:acme:error:connection"
["error","detail"]      "During secondary validation: 178.27.xxx.xx: Fetching http://XXX.mydomain.de/.well-known/acme-challenge/9NUgxoqAKvHt_H77xE6PCnFdCSuooZ2rnrIyIokM1-4: Timeout during connect (likely firewall problem)"
["error","status"]      400
["error"]       {"type":"urn:ietf:params:acme:error:connection","detail":"During secondary validation: 178.27.xxx.xx: Fetching http://XXX.mydomain.de/.well-known/acme-challenge/9NUgxoqAKvHt_H77xE6PCnFdCSuooZ2rnrIyIokM1-4: Timeout during connect (likely firewall problem)","status":400}
["url"] "https://acme-v02.api.letsencrypt.org/acme/chall-v3/357319482982/oR-gvw"
["token"]       "9NUgxoqAKvHt_H77xE6PCnFdCSuooZ2rnrIyIokM1-4"
["validationRecord",0,"url"]    "http://XXX.mydomain.de/.well-known/acme-challenge/9NUgxoqAKvHt_H77xE6PCnFdCSuooZ2rnrIyIokM1-4"
["validationRecord",0,"hostname"]       "XXX.mydomain.de"
["validationRecord",0,"port"]   "80"
["validationRecord",0,"addressesResolved",0]    "178.27.xxx.xx"
["validationRecord",0,"addressesResolved"]      ["178.27.xxx.xx"]
["validationRecord",0,"addressUsed"]    "178.27.xxx.xx"
["validationRecord",0]  {"url":"http://XXX.mydomain.de/.well-known/acme-challenge/9NUgxoqAKvHt_H77xE6PCnFdCSuooZ2rnrIyIokM1-4","hostname":"XXX.mydomain.de","port":"80","addressesResolved":["178.27.xxx.xx"],"addressUsed":"178.27.xxx.xx"}
["validationRecord",1,"url"]    "https://XXX.mydomain.de/.well-known/acme-challenge/9NUgxoqAKvHt_H77xE6PCnFdCSuooZ2rnrIyIokM1-4"
["validationRecord",1,"hostname"]       "XXX.mydomain.de"
["validationRecord",1,"port"]   "443"
["validationRecord",1,"addressesResolved",0]    "178.27.xxx.xx"
["validationRecord",1,"addressesResolved"]      ["178.27.xxx.xx"]
["validationRecord",1,"addressUsed"]    "178.27.xxx.xx"
["validationRecord",1]  {"url":"https://XXX.mydomain.de/.well-known/acme-challenge/9NUgxoqAKvHt_H77xE6PCnFdCSuooZ2rnrIyIokM1-4","hostname":"XXX.mydomain.de","port":"443","addressesResolved":["178.27.xxx.xx"],"addressUsed":"178.27.xxx.xx"}
["validationRecord"]    [{"url":"http://XXX.mydomain.de/.well-known/acme-challenge/9NUgxoqAKvHt_H77xE6PCnFdCSuooZ2rnrIyIokM1-4","hostname":"XXX.mydomain.de","port":"80","addressesResolved":["178.27.xxx.xx"],"addressUsed":"178.27.xxx.xx"},{"url":"https://XXX.mydomain.de/.well-known/acme-challenge/9NUgxoqAKvHt_H77xE6PCnFdCSuooZ2rnrIyIokM1-4","hostname":"XXX.mydomain.de","port":"443","addressesResolved":["178.27.xxx.xx"],"addressUsed":"178.27.xxx.xx"}]
["validated"]   "2024-05-30T06:59:31Z")

I’ve already checked or tried the following point or steps:

The necessary vhost line in IPFire’s Apache http conf is in place and works as before
Requesting the above mentioned token file, from Firefox, works using the path https://XXX.mydomain.de/.well-known/acme-challenge/9NUgxoqAKvHt_H77xE6PCnFdCSuooZ2rnrIyIokM1-4
Of course I get a certificate expired warning which Firefox let me pass, confirming the appropriate warning
I can request the token file from a IPFire SSH shell using the URI http://XXX.mydomain.de/.well-known/acme-challenge/9NUgxoqAKvHt_H77xE6PCnFdCSuooZ2rnrIyIokM1-4 with http protocol, to see if redirection from http to https (HAproxy configuration!) still works. Yes it works, get the warning about expired certificate as well of course
I’ve GeoIP blocking in place within IPFire, but I disabled it completely and reloaded FW rules and re-checked the renewal process - still fails!
Previously, any well know countries (US, UK), where LE org’s servers are hosted, were excluded from Geo blocking. So, basically this should work without any chance on my side.
Various restarts of HAproxy done, to no avail!

The next step I’m testing:

Disable SSL termination (redirects from http to https) in HAproxy’s config and start certification renewal again. Right now it’s not possible since LE blocks the process, because of too many tries

Nevertheless, I’m trying to find out what is happening and seeking for some hints in community.

So I would be happy to get some inputs on what to check what’s going on here on my side.

Thanks, Michael

Edit: HAproxy logs during the renewal process show:

10:48:36	haproxy[21501]: 	162.142.125.34:33608 [30/May/2024:10:48:36.161] stats stats/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 2/2/0/0/0 0/0 "GET / HTTP/1.1"
10:48:30	haproxy[21501]: 	34.222.23.129:28680 http_https letsencrypt/letsencrypt 200 242 ---- "GET /.well- known/acme-challenge/6FC2-H2NAiWQU5k7xxvFwBM3AiEwpTdz3l_dbBqkGkY HTTP/1.1"
10:48:29	haproxy[21501]: 	3.145.10.139:39082 http_https letsencrypt/letsencrypt 200 242 ---- "GET /.well-k nown/acme-challenge/6FC2-H2NAiWQU5k7xxvFwBM3AiEwpTdz3l_dbBqkGkY HTTP/1.1"
10:48:26	haproxy[21501]: 	162.142.125.34:33956 [30/May/2024:10:48:26.892] stats stats/<NOSRV> 0/-1/-1/-1/0 503 217 - - SC-- 2/2/0/0/0 0/0 "GET / HTTP/1.1"
10:48:22	haproxy[21501]: 	162.142.125.34:33948 [30/May/2024:10:48:22.847] stats stats/<NOSRV> -1/-1/-1/-1/ 0 400 0 - - PR-- 2/2/0/0/0 0/0 "<BADREQ>"
10:48:20	haproxy[21501]: 	66.133.109.36:61141 http_https letsencrypt/letsencrypt 200 242 ---- "GET /.well- known/acme-challenge/6FC2-H2NAiWQU5k7xxvFwBM3AiEwpTdz3l_dbBqkGkY HTTP/1.1"

It looks to me, in the first line above, that HAproxy does not find any server to handle the request: .

Edit2: Since LE has a debug web site, located here https://letsdebug.net/, I’ve done a test drive. The result is

The log gives me those lines:

11:41:21	haproxy[14946]: 	66.133.109.36:40169 http_https~ letsencrypt/letsencrypt 404 340 ---- "GET /.well -known/acme-challenge/tn3RQ4apU59MV-QybN6H9fIV8GNOtM-t2RFcix_r6sU HTTP/1.1"

Does this mean that the Apache config does not work anymore?

# cat /etc/httpd/conf/vhosts.d/acme.conf
Listen 5001
<VirtualHost *:5001>
        DocumentRoot "/var/www/dehydrated"
  Alias /.well-known/acme-challenge/ /var/www/dehydrated/
</VirtualHost>

In my documentation, I’ve previously noted this error which usually points to some geoblocking in place. As written above, I’ve disabled Geo blocking in WebIf as well as IPS completely before running the debug test and during certification renewal, to no avail.

How can I check if geo blocking is still active in IPFire as it seems to be the case?
Should be disabled, right?

Whereas in a shell, gives:

# iptables -L |grep LOCATIONBLOCK
LOCATIONBLOCK  all  --  anywhere             anywhere
LOCATIONBLOCK  all  --  anywhere             anywhere
Chain LOCATIONBLOCK (2 references)

Is this an active blocking or not?

nickh · 30 May 2024 09:46

No that is not a location block. A better query would be iptables -nvL | egrep 'LOCATIONBLOCK|Chain' because that would show you in which Chain those lines appear. Even then it is meaningless. You then need to inspect the LOCATIONBLOCK chain which can be done in the UI or iptables -nvL LOCATIONBLOCK. That holds the key to what is being location blocked.

BTW, are you forcing http → https everywhere? I don’t think LE likes that and needs http.

[edit]
I actually prefer iptables -nvL | egrep 'LOCATIONBLOCK|Chain|pkts' as you then also get the column headers.
[/edit]

hellfire · 30 May 2024 11:03

Yes, that’s the main purpose of fetching and renewing SSL certificates on IPFire. I’m using HAproxy’s SSL termination feature to first redirect http to https requests and of course to reach out internal webservers that do not have any certificates available.

HAproxy does all the job and I’ve got this redirection in place for quite a long time now and not since last three months. This is the period I’ve to renew the certs.
Which means, the previous renewal process worked perfectly well. In the meantime I’ve upgraded IPFire to core 185. Maybe this is a cause of the issue now?

Btw, I’ve switched off http → https for testing purposed, to no avail either. OTH, I can see in the logs above, that LE tries both, http and https - at least I guess ;-(

When firing this command

iptables -nvL | egrep 'LOCATIONBLOCK|Chain|pkts

I get a prompt instead of a valid output. Did I made sthg. wrong?

bonnietwin · 30 May 2024 11:28

There have been several updates to haproxy but not in CU185.

CU184 updated haproxy from 2.8.5 to 2.9.2
CU183 updated haproxy from 2.8.1 to 2.8.5
CU173 updated haproxy from 2.7.1 to 2.8.1

Maybe one of these haproxy updates changed something in how the Lets Encrypt update is done.

hellfire · 30 May 2024 11:54

Thanks Adolf for your reply.

Indeed, I’ve upgrade from 173 to 184 recently, so maybe that’s a one possible cause.

As written above, I’m using a VHOST configuration in Apache that is used as the target for the LE renewal challenge:

Listen 5001

<VirtualHost *:5001>
	DocumentRoot "/var/www/dehydrated"
  Alias /.well-known/acme-challenge/ /var/www/dehydrated/
</VirtualHost>

This means Apache should listen to port 5001 for any incoming request, redirected by HAProxy to this specific port. Usually this works well, not today

I would expected that this port is shown when executing the following command, but instead of an IPv4 line, I just see an IPv6 one.

❯ netstat -na | grep ':5001.*LISTEN'
tcp6       0      0 :::5001                 :::*                    LISTEN

I wonder if this is the cause of all my troubles, because when firing this command, I see some correct ports I would expect IPFire uses:

❯ netstat -na | grep ':80.*LISTEN'
tcp        0      0 172.18.0.1:8080         0.0.0.0:*               LISTEN
tcp        0      0 192.168.0.1:8080        0.0.0.0:*               LISTEN
tcp        0      0 172.17.0.2:80           0.0.0.0:*               LISTEN

So why is IPv4 port 5001 not shown? Is this an Apache issue?

nickh · 30 May 2024 13:47

You’ve missed the closing quote. iptables -nvL | egrep 'LOCATIONBLOCK|Chain|pkts'

Often as a port is listening on both IPv4 and IPv6, netstat just shows the IPv6. I am not sure why, but have never dug into it.

ummeegge · 30 May 2024 16:49

Hellfire:

Listen 5001

<VirtualHost *:5001>
	DocumentRoot "/var/www/dehydrated"
  Alias /.well-known/acme-challenge/ /var/www/dehydrated/
</VirtualHost>

So why is IPv4 port 5001 not shown? Is this an Apache issue?

Did you try to change the binding to an IPv4 related address, like e.g.

Listen 0.0.0.0:5001
[or]
Listen [Your_ipv4_address]:5001

→ Binding to Addresses and Ports - Apache HTTP Server Version 2.4 ?

The -4 flag should show IPv4 only in netstat but also ss.

Best,

Erik

nickh · 30 May 2024 17:01

It isn’t that (netstat). Here is a non-IPF system:

[root@server ~]# netstat -nplt | grep cyrus
tcp        0      0 0.0.0.0:143             0.0.0.0:*               LISTEN      12578/cyrus-master
tcp        0      0 127.0.0.1:2003          0.0.0.0:*               LISTEN      12578/cyrus-master
tcp        0      0 127.0.0.1:4190          0.0.0.0:*               LISTEN      12578/cyrus-master
tcp6       0      0 :::993                  :::*                    LISTEN      12578/cyrus-master
tcp6       0      0 :::143                  :::*                    LISTEN      12578/cyrus-master
[root@server ~]# netstat -nplt4 | grep cyrus
tcp        0      0 0.0.0.0:143             0.0.0.0:*               LISTEN      12578/cyrus-master
tcp        0      0 127.0.0.1:2003          0.0.0.0:*               LISTEN      12578/cyrus-master
tcp        0      0 127.0.0.1:4190          0.0.0.0:*               LISTEN      12578/cyrus-master

Now I know cyrus-imapd is listening on port 993 with IPv4 as that is how I use it but it does not show with a -4 switch.

ummeegge · 30 May 2024 17:07

Did you get the same results with ss ? The flags should be pretty much the same…

hellfire · 30 May 2024 17:17

Yes I tried method one already, did not change the game:

<VirtualHost 0.0.0.0:5001>

But I’ve found the solution, why the challenge process does not work any more!
I’ve to point out, however, that I did not change any FW rule, any FW option, nor any FW group after updating IPFire or the last months!

Maybe you guys can explain what’s happening?

Geo blocking is activated or was deactivated during my above tests and is still activated while performing the steps below!

In addition to the Geo blocking, I had an incoming FW rule set up in order to let the LE challenge bypass the firewall and to access a local IP-address: IPFire itself.
This is necessary because the LE http requests had to reach out HAProxy. That’s the prerequisites that perfectly worked in the past., until today or maybe until I’ve updated to core185(?)

This mentioned incoming FW rule is set up as:

And the location group is configured as:

I noticed that any requests from LE servers had as origin an US IP-address, like this one

# location lookup 66.133.109.36
66.133.109.36:
  Network                 : 66.133.96.0/19
  Country                 : United States of America
  Autonomous System       : AS13649 - Flexential Colorado Corp.

This is an excerpt of the FW log that unmistakeably shows the IP located in US.

I would assume that the incoming firewall rule and the location group above, would pass any traffic from external, especially those US IP-addresses right to IPFire’s, red interface. It did so in the past.

After creating a new firewall rule as below

my LE challenge issues were solved.

It’s not the different protocol setting in the new rule (I changed it back for testing purposes to the preset service group WWW, from the first FW rule) it’s the source that made the difference!

Can sbdy. explain why a location group with IP-address from US does not work, while the source is located in United States?

Or did I open some some other needless sources when using “All” as source network, by mistake that cured my issues?

Edit: I guess I’ve been hit by this change on Let’s Encrypt side: https://community.letsencrypt.org/t/unexpected-renewal-failures-since-april-2024-please-read-this/216830
This does not explain, though, why an US IP-address does not pass the FW if a rule explicitly allows this.

nickh · 30 May 2024 18:26

That looks really dangerous. Haven’t you just opened up your firewall to everything from anywhere?

I guess it all depends on the firewall ordering. If that is the only change you’ve made, can you dump the firewall with iptables -nvL with and without the extra rule?

hellfire · 30 May 2024 18:55

I have the same feelings

Don’t get me wrong, I just want a way through the firewall for the moment of the LE challenge process. This cert renewal is currently done by a cron job that runs on every first day of a month.

So, if I could apply an incoming FW rule, similar to the above, within the shell script, for the renewal process, and removing the rule after the job is done, that would be the ideal solution at the moment.

That ordering is a key part, of course, the rule with the location group was at position one of all incoming FW rules so far. As you already know, this stopped working unless I’ve modified the source.

Attached the dumps as a ZIP file since the forum does not allow such large postings

iptables dump.zip (13.0 KB)

Thanks for looking into!

nickh · 30 May 2024 20:31

Your dumps are huge. You seem to run very restrictive outbound policies.

Where is your LE endpoint listening on 5001? Is it in IPF or behind IPF? If it is in IPF, have you tried adding TCP port 5001 to your WWW group?

What is the HAProxy rule for this redirect?

hellfire · 31 May 2024 10:58

Yes, I’ve got many rules in place, that’s true

The LE endpoint is listening on port 5001 defined by a VHOST within Apache configuration on IPFire itself.
So the endpoint is definitely on the same machine as IPFire.

I doubt that adding port 5001 to the port list of the FW group WWW is the solution.
As written above, solely switching the source to network RED works, without adding any additional port to the preset WWW.

It has sthg to do with the fact, that the Firewall is blocking or not passing though the LE IP-addresses from source country US.

OTH, I discovered that I desperately need such a FW input rule to pass some traffic from source network (RED or some defined countries) to the firewall itself.
Otherwise no traffic goes from internet to HAProxy and to the firewall, and does not reach the VHOST.

This basically leaves two questions:

How to define a more or less secure FW rule to pass incoming traffic from a defined external source to the firewall
Why is the incoming country US not identified as such and won’t pass the FW rule.

Michael

nickh · 31 May 2024 11:14

Solely switching the source to Red and removing any port restrictions completely opens your firewall to everywhere. Not a good idea.

For a starter, try adding 5001 to your www group.

What I am after from HAProxy is how the redirect target is defined? Is it defined as localhost, Red or Green?

hellfire · 31 May 2024 17:53

Thanks Nick again for your reply!

Here are some parts of the haproxy.cfg file:

frontend http_https
    mode http
...
   bind 172.17.0.2:80
...
    #---------------------
    #HAProxy handles SSL
    #---------------------
    http-request set-header X-Forwarded-Proto https if { ssl_fc }
    http-request redirect scheme https code 301 if !{ ssl_fc }
...
    #----Lets Encrypt certificate renewal----
    #Parse URI to see if it is a letsencrypt request
    acl is_letsencrypt path_beg -i /.well-known/acme-challenge/
...
#---------------------------------------------------------------------
# Backend for Lets Encrypt certificate renewal local Apache web-server
#---------------------------------------------------------------------
backend letsencrypt
    server letsencrypt 127.0.0.1:5001 check

Instead of the IP of localhost (127.0.0.1) I’ve additionally tested 192.168.0.1, the gateway address of IPFire itself - nothing changed though.

I’ve added port 5001 to the firewall group WWW to no avail either.

Btw, right now LE services are down on their end, it seems for further testing…

nickh · 31 May 2024 19:04

So you are using localhost as the redirect and lo is open in the firewall. Hmm.

At the same time you had an apache log which indicates the inbound message and initial TCP handshake is working.

hellfire · 31 May 2024 19:41

No, not quite correct.

Haproxy is installed on IPFire, the target web server is Apache installed on IPFire, too. It’s the default web server for the WebIF that is configured with a VHOST responding on port 5001 for the LE cert validation requests.
I can reach IPFire’s web server from any browser, if using IPFire’s green IP-address 192.168.0.1 with port 5001: http://192.168.0.1:5001.

Hence, I’m using the localhost (127.0.0.1) and port 5001 for redirecting the incoming cert validation request by Let’s Encrypt to the locally installed Apache server (using a vhost).

The redirection properly works if a specific rule is on position 1 of the incoming firewall rules.

It does not work with the previous rule, containing some countries as the source, including the US.

I do not see the renewal http request in an Apache log (in fact there is none within the WebIF), but in the logs of HAProxy or in firewall logs if logging is enabled in the the specific FW rule.
Such as

nickh · 31 May 2024 20:27

If you watch tcpdump, do you see the redirect to port 5001 i.e. any traffic on it?

hellfire · 1 June 2024 18:09

Hi Nick!

I tried really hard to find out the correct command, but either there is not traffic on port 5001 (although LE Debug succeeded, which means the traffic reaches the VHOST through the firewall) or I simply did not find the correct command parameters.

I tried

tcpdump -i red0 port 5001
tcpdump port 5001
tcpdump -i red0 dst port 5001
tcpdump "host 127.0.0.1" port 5001

To no avail, no signs of traffic noticed, the last command above fails anyway.
Which command would be a correct on? Do you have any hint for me, please?