Maltfield Log/2024 H1: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 7: | Line 7: | ||
# [[User:Maltfield]] | # [[User:Maltfield]] | ||
# [[Special:Contributions/Maltfield]] | # [[Special:Contributions/Maltfield]] | ||
=Thr February 29, 2024= | |||
# Marcin alerted me on Feb 24 that the wiki was down | |||
## I did a quick test and the frontpage loaded fine. The status page also didn't indicate any downtime | |||
## however, when I tried to do a search on the wiki, it stalled for a while and eventually I got a varnish error saying it couldn't reach the backend | |||
## I tried to check munin, but I got the same varnish error | |||
## I told Marcin that he could try to just give it a reboot from the hetzner WUI, which he did | |||
## after it came back, I checked Munin and saw that the db was down. I also saw a huge spike in apache memory usage | |||
# Over the next few days, I noticed that OSSEC has been spamming me with emails once every 5 minutes | |||
<pre> | |||
OSSEC HIDS Notification. | |||
2024 Feb 27 23:12:07 | |||
Received From: opensourceecology->/var/log/messages | |||
Rule: 1002 fired (level 2) -> "Unknown problem somewhere in the system." | |||
Portion of the log(s): | |||
Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483 | |||
--END OF NOTIFICATION | |||
</pre> | |||
## I'm not certain, but the email might be caused by some OpenVPN cron used to connect the prod & staging servers. It's possible that it expired and that the expiration is causing some loop that's bogging down the server somehow | |||
# Every time Marcin reboots the server, I get an email alert from hetzner | |||
## I noticed that Marcin rebooted it again today (Feb 29) at 10:03 (EST). I sent him an email and he confirmed this | |||
## I also see he did another reboot today at 17:58 | |||
# I ssh'd into the server and checked /var/log/messages. Here's one of the snippets | |||
<pre> | |||
Mar 1 03:38:24 opensourceecology connect.sh: TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194 | |||
Mar 1 03:38:24 opensourceecology connect.sh: Socket Buffers: R=[212992->212992] S=[212992->212992] | |||
Mar 1 03:38:24 opensourceecology connect.sh: UDP link local: (not bound) | |||
Mar 1 03:38:24 opensourceecology connect.sh: UDP link remote: [AF_INET]195.201.233.113:1194 | |||
Mar 1 03:38:24 opensourceecology connect.sh: TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=852b79d5 cd86dfc4 | |||
Mar 1 03:38:24 opensourceecology connect.sh: VERIFY OK: depth=1, CN=osedev1 | |||
Mar 1 03:38:24 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483 | |||
Mar 1 03:38:24 opensourceecology connect.sh: OpenSSL: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed | |||
Mar 1 03:38:24 opensourceecology connect.sh: TLS_ERROR: BIO read tls_read_plaintext error | |||
Mar 1 03:38:24 opensourceecology connect.sh: TLS Error: TLS object -> incoming plaintext read error | |||
Mar 1 03:38:24 opensourceecology connect.sh: TLS Error: TLS handshake failed | |||
Mar 1 03:38:24 opensourceecology connect.sh: SIGUSR1[soft,tls-error] received, process restarting | |||
Mar 1 03:38:24 opensourceecology connect.sh: Restart pause, 300 second(s) | |||
</pre> | |||
## indeed, the line above what was emailed from OSSEC says <pre>osedev</pre>, so this appears to be related to the staging/dev setup | |||
## running `systemctl` shows that the service <pre>openvpn-client.service</pre> is in the <pre>failed</pre> state | |||
<pre> | |||
[root@opensourceecology cron.d]# systemctl | |||
... | |||
● openvpn-client.service loaded failed failed openvpn-client.service | |||
</pre> | |||
# I found the config in /etc/systemd/system | |||
<pre> | |||
[root@opensourceecology ~]# cd /etc/systemd/system | |||
[root@opensourceecology system]# ls | |||
basic.target.wants getty.target.wants nginx.service.d sysinit.target.wants | |||
default.target local-fs.target.wants openvpn-client.service system-update.target.wants | |||
default.target.wants multi-user.target.wants sockets.target.wants | |||
[root@opensourceecology system]# cat openvpn-client.service | |||
[Unit] | |||
Description=OpenVPN tunnel for %I | |||
After=syslog.target network-online.target | |||
Wants=network-online.target | |||
Documentation=man:openvpn(8) | |||
Documentation=https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage | |||
Documentation=https://community.openvpn.net/openvpn/wiki/HOWTO | |||
[Service] | |||
User=root | |||
Type=notify | |||
PrivateTmp=true | |||
WorkingDirectory=/etc/openvpn/client | |||
#WorkingDirectory=/root/openvpn | |||
#ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf | |||
ExecStart=/etc/openvpn/client/connect.sh | |||
CapabilityBoundingSet=CAP_IPC_LOCK CAP_NET_ADMIN CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SYS_CHROOT CAP_DAC_OVERRIDE | |||
#LimitNPROC=10 | |||
LimitNPROC=infinity | |||
DeviceAllow=/dev/null rw | |||
DeviceAllow=/dev/net/tun rw | |||
ProtectSystem=true | |||
ProtectHome=true | |||
KillMode=process | |||
[Install] | |||
WantedBy=multi-user.target | |||
[root@opensourceecology system]# | |||
</pre> | |||
# from above, it looks like the path to the connect.sh script is <pre>/etc/openvpn/client/connect.sh</pre> | |||
<pre> | |||
[root@opensourceecology system]# cd /etc/openvpn/client/ | |||
[root@opensourceecology client]# ls | |||
auth.txt ca.crt client.conf connect.sh hetzner2.crt hetzner2.key ta.key | |||
[root@opensourceecology client]# cat connect.sh | |||
#!/bin/bash | |||
# yes, storing 2fa secret keys here doesn't add security; we're doing it only | |||
# because we can't exclude 2fa on a client-by-client basis in the openvpn | |||
# free server. 2fa is important for humans with bad passwords, not so much for | |||
# our server accounts. If someone can read this file, we've bigger problems. | |||
TOTP_SECRET=OBFUSCATED | |||
USERNAME=OBFUSCATED | |||
token=`oathtool --base32 --totp ${TOTP_SECRET}` | |||
echo -e "${USERNAME}\n${token}" > auth.txt | |||
/usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf | |||
[root@opensourceecology client]# | |||
</pre> | |||
# If I manually run the openvpn command from the above script, I get the error | |||
<pre> | |||
[root@opensourceecology client]# /usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf | |||
OpenVPN 2.4.12 x86_64-redhat-linux-gnu [Fedora EPEL patched] [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Mar 17 2022 | |||
library versions: OpenSSL 1.0.2k-fips 26 Jan 2017, LZO 2.06 | |||
WARNING: Your certificate has expired! | |||
Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication | |||
Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication | |||
TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194 | |||
Socket Buffers: R=[212992->212992] S=[212992->212992] | |||
UDP link local: (not bound) | |||
UDP link remote: [AF_INET]195.201.233.113:1194 | |||
TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=11bab8ae 02ebcb41 | |||
VERIFY OK: depth=1, CN=osedev1 | |||
VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483 | |||
OpenSSL: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed | |||
TLS_ERROR: BIO read tls_read_plaintext error | |||
TLS Error: TLS object -> incoming plaintext read error | |||
TLS Error: TLS handshake failed | |||
SIGUSR1[soft,tls-error] received, process restarting | |||
Restart pause, 5 second(s) | |||
^CSIGINT[hard,init_instance] received, process exiting | |||
[root@opensourceecology client]# | |||
</pre> | |||
# that's enlightening, but I'm not sure I understand why this would crash the system | |||
# I checked munin | |||
## again, the db queries suddenly drops to 0 in 3 places, and we can clearly see when the server is broken | |||
## cooresponding with all 3 gaps in the db chart, there's a sawtooth vertical climb in httpd rss usage. interesting. | |||
## there's also a huge vertical spike in number of processes at these 3 times | |||
## uptime was 1,400 days before we rebooted :whistles: | |||
### but that's the point: this was a very well-oiled machine. why is running a "connect" script over-and-over bringing the whole thing to its knees? could that really be it? | |||
## there is a huge spike in postfix mail queue. so it's possible that it's actually openvpn that's triggering ossec that's triggering postfix and that's causing issues | |||
# honestly, nobody is using osedev or osestaging. I could renew the certs, but I'm not sure that's warranted atm | |||
# instead, I'll do two things: [1] I'll add an ossec rule not to email alerts for this and [2] I'll disable the openvpn-client unit in systemd | |||
# first we test the current behaviour of ossec with this log entry | |||
<pre> | |||
[root@opensourceecology ~]# /var/ossec/bin/ossec-logtest | |||
2024/03/01 04:58:05 ossec-testrule: INFO: Reading local decoder file. | |||
2024/03/01 04:58:05 ossec-testrule: INFO: Started (pid: 12113). | |||
ossec-testrule: Type one log per line. | |||
Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483 | |||
**Phase 1: Completed pre-decoding. | |||
full event: 'Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' | |||
hostname: 'opensourceecology' | |||
program_name: 'connect.sh' | |||
log: 'VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' | |||
**Phase 2: Completed decoding. | |||
No decoder matched. | |||
**Phase 3: Completed filtering (rules). | |||
Rule id: '1002' | |||
Level: '2' | |||
Description: 'Unknown problem somewhere in the system.' | |||
**Alert to be generated. | |||
</pre> | |||
# next we make a backup of the local rules file | |||
<pre> | |||
[root@opensourceecology client]# cd /var/ossec/rules/ | |||
[root@opensourceecology rules]# ls | |||
apache_rules.xml hordeimp_rules.xml nginx_rules.xml rules_config.xml trend-osce_rules.xml | |||
apparmor_rules.xml ids_rules.xml openbsd_rules.xml sendmail_rules.xml unbound_rules.xml | |||
arpwatch_rules.xml imapd_rules.xml opensmtpd_rules.xml smbd_rules.xml vmpop3d_rules.xml | |||
asterisk_rules.xml local_rules.xml ossec_rules.xml solaris_bsm_rules.xml vmware_rules.xml | |||
attack_rules.xml mailscanner_rules.xml pam_rules.xml sonicwall_rules.xml vpn_concentrator_rules.xml | |||
cimserver_rules.xml mcafee_av_rules.xml php_rules.xml spamd_rules.xml vpopmail_rules.xml | |||
cisco-ios_rules.xml msauth_rules.xml pix_rules.xml squid_rules.xml vsftpd_rules.xml | |||
clam_av_rules.xml ms_dhcp_rules.xml policy_rules.xml sshd_rules.xml web_appsec_rules.xml | |||
courier_rules.xml ms-exchange_rules.xml postfix_rules.xml symantec-av_rules.xml web_rules.xml | |||
dovecot_rules.xml ms_ftpd_rules.xml postgresql_rules.xml symantec-ws_rules.xml wordpress_rules.xml | |||
dropbear_rules.xml ms-se_rules.xml proftpd_rules.xml syslog_rules.xml zeus_rules.xml | |||
firewalld_rules.xml mysql_rules.xml pure-ftpd_rules.xml sysmon_rules.xml | |||
firewall_rules.xml named_rules.xml racoon_rules.xml systemd_rules.xml | |||
ftpd_rules.xml netscreenfw_rules.xml roundcube_rules.xml telnetd_rules.xml | |||
[root@opensourceecology rules]# cp local_rules.xml local_rules.xml.20240229 | |||
</pre> | |||
# I added a new rule to catch this and stop emailing on them | |||
<pre> | |||
[root@opensourceecology rules]# diff local_rules.xml.20240229 local_rules.xml | |||
93a94,99 | |||
> <rule id="100057" level="2"> | |||
> <if_sid>1002</if_sid> | |||
> <match>error=certificate has expired</match> | |||
> <options>no_email_alert</options> | |||
> <description>2024-02-29: Ignore OpenVPN cert errors that spam us every 5 minutes and crash the server</description> | |||
> </rule> | |||
[root@opensourceecology rules]# | |||
</pre> | |||
# I restarted ossec and confirmed it matches to the new rule. For some reason it still says "Alert to be generated", which I don't understand why. The rule 100057 clearly has "no_email_alert" set | |||
<pre> | |||
[root@opensourceecology ~]# systemctl restart ossec | |||
[root@opensourceecology ~]# /var/ossec/bin/ossec-logtest | |||
2024/03/01 05:07:33 ossec-testrule: INFO: Reading local decoder file. | |||
2024/03/01 05:07:33 ossec-testrule: INFO: Started (pid: 15844). | |||
ossec-testrule: Type one log per line. | |||
Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483 | |||
**Phase 1: Completed pre-decoding. | |||
full event: 'Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' | |||
hostname: 'opensourceecology' | |||
program_name: 'connect.sh' | |||
log: 'VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' | |||
**Phase 2: Completed decoding. | |||
No decoder matched. | |||
**Phase 3: Completed filtering (rules). | |||
Rule id: '100057' | |||
Level: '2' | |||
Description: '2024-02-29: Ignore OpenVPN cert errors that spam us every 5 minutes and crash the server' | |||
**Alert to be generated. | |||
</pre> | |||
# ah crap, I just realized that there's tons of different lines that are triggering these emails | |||
## | |||
<pre> | |||
Feb 29 08:18:02 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=1780983422210014075884459813273922948 | |||
</pre> | |||
<pre> | |||
Feb 29 08:18:02 opensourceecology connect.sh: OpenSSL: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed | |||
</pre> | |||
<pre> | |||
Feb 29 08:18:02 opensourceecology connect.sh: TLS_ERROR: BIO read tls_read_plaintext error | |||
</pre> | |||
<pre> | |||
Feb 29 08:18:02 opensourceecology connect.sh: TLS Error: TLS object -> incoming plaintext read error | |||
</pre> | |||
<pre> | |||
Feb 29 08:18:02 opensourceecology connect.sh: TLS Error: TLS handshake failed | |||
</pre> | |||
<pre> | |||
Feb 29 08:18:02 opensourceecology connect.sh: SIGUSR1[soft,tls-error] received, process restarting | |||
</pre> | |||
# well, the one thing all these lines have in common is that they contain <pre>opensourceecology connect.sh:</pre>, so I'll match on that instead | |||
<pre> | |||
[root@opensourceecology rules]# diff local_rules.xml.20240229 local_rules.xml | |||
93a94,99 | |||
> <rule id="100057" level="2"> | |||
> <if_sid>1002</if_sid> | |||
> <match>opensourceecology connect.sh: </match> | |||
> <options>no_email_alert</options> | |||
> <description>2024-02-29: Ignore OpenVPN cert errors that spam us every 5 minutes and crash the server</description> | |||
> </rule> | |||
[root@opensourceecology rules]# | |||
</pre> | |||
# and a test | |||
<pre> | |||
[root@opensourceecology ~]# systemctl restart ossec | |||
[root@opensourceecology ~]# /var/ossec/bin/ossec-logtest | |||
2024/03/01 05:14:57 ossec-testrule: INFO: Reading local decoder file. | |||
2024/03/01 05:14:57 ossec-testrule: INFO: Started (pid: 17861). | |||
ossec-testrule: Type one log per line. | |||
Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483 | |||
**Phase 1: Completed pre-decoding. | |||
full event: 'Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' | |||
hostname: 'opensourceecology' | |||
program_name: 'connect.sh' | |||
log: 'VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' | |||
**Phase 2: Completed decoding. | |||
No decoder matched. | |||
**Phase 3: Completed filtering (rules). | |||
Rule id: '1002' | |||
Level: '2' | |||
Description: 'Unknown problem somewhere in the system.' | |||
**Alert to be generated. | |||
</pre> | |||
# shit, that didn't work. probably regex syntax on one of the special characters | |||
# ah, the decoder extracted the two fields as <pre>hostname</pre> and <pre>program_name</pre>, and it appears that <pre><match></pre> only applies to the <pre>log</pre> portion of the message. this works | |||
<pre> | |||
[root@opensourceecology rules]# diff local_rules.xml.20240229 local_rules.xml | |||
93a94,100 | |||
> <rule id="100057" level="2"> | |||
> <if_sid>1002</if_sid> | |||
> <hostname>opensourceecology</hostname> | |||
> <program_name>connect.sh</program_name> | |||
> <options>no_email_alert</options> | |||
> <description>2024-02-29: Ignore OpenVPN cert errors that spam us every 5 minutes and crash the server</description> | |||
> </rule> | |||
[root@opensourceecology rules]# | |||
</pre> | |||
## but for some reason it still says <pre>Alert will be generated</pre> | |||
<pre> | |||
[root@opensourceecology ~]# systemctl restart ossec | |||
[root@opensourceecology ~]# /var/ossec/bin/ossec-logtest | |||
2024/03/01 05:24:57 ossec-testrule: INFO: Reading local decoder file. | |||
2024/03/01 05:24:57 ossec-testrule: INFO: Started (pid: 22484). | |||
ossec-testrule: Type one log per line. | |||
Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483 | |||
**Phase 1: Completed pre-decoding. | |||
full event: 'Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' | |||
hostname: 'opensourceecology' | |||
program_name: 'connect.sh' | |||
log: 'VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' | |||
**Phase 2: Completed decoding. | |||
No decoder matched. | |||
**Phase 3: Completed filtering (rules). | |||
Rule id: '100057' | |||
Level: '2' | |||
Description: '2024-02-29: Ignore OpenVPN cert errors that spam us every 5 minutes and crash the server' | |||
**Alert to be generated. | |||
</pre> | |||
# I also went ahead and stopped & disabled the openvpn-client service in systemd | |||
<pre> | |||
[root@opensourceecology rules]# systemctl status openvpn-client | |||
● openvpn-client.service | |||
Loaded: loaded (/etc/systemd/system/openvpn-client.service; enabled; vendor preset: disabled) | |||
Active: failed (Result: timeout) since Thu 2024-02-29 22:34:17 UTC; 6h ago | |||
Docs: man:openvpn(8) | |||
https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage | |||
https://community.openvpn.net/openvpn/wiki/HOWTO | |||
Process: 1110 ExecStart=/etc/openvpn/client/connect.sh (code=killed, signal=TERM) | |||
Main PID: 1110 (code=killed, signal=TERM) | |||
CGroup: /system.slice/openvpn-client.service | |||
└─1171 /usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: UDP link remote: [AF_INET]195.201.233.113:1194 | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS: Initial packet from [AF_INET]195.201.233.113:1194...453 | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: VERIFY OK: depth=1, CN=osedev1 | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: VERIFY ERROR: depth=0, error=certificate has expired: ...483 | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: OpenSSL: error:14090086:SSL routines:ssl3_get_server_c...led | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS_ERROR: BIO read tls_read_plaintext error | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS Error: TLS object -> incoming plaintext read error | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS Error: TLS handshake failed | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: SIGUSR1[soft,tls-error] received, process restarting | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: Restart pause, 300 second(s) | |||
Hint: Some lines were ellipsized, use -l to show in full. | |||
[root@opensourceecology rules]# systemctl stop openvpn-client | |||
[root@opensourceecology rules]# systemctl disable openvpn-client | |||
Removed symlink /etc/systemd/system/multi-user.target.wants/openvpn-client.service. | |||
[root@opensourceecology rules]# systemctl status openvpn-client | |||
● openvpn-client.service | |||
Loaded: loaded (/etc/systemd/system/openvpn-client.service; disabled; vendor preset: disabled) | |||
Active: failed (Result: timeout) since Thu 2024-02-29 22:34:17 UTC; 6h ago | |||
Docs: man:openvpn(8) | |||
https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage | |||
https://community.openvpn.net/openvpn/wiki/HOWTO | |||
Main PID: 1110 (code=killed, signal=TERM) | |||
CGroup: /system.slice/openvpn-client.service | |||
└─1171 /usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS: Initial packet from [AF_INET]195.201.233.113:1194...453 | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: VERIFY OK: depth=1, CN=osedev1 | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: VERIFY ERROR: depth=0, error=certificate has expired: ...483 | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: OpenSSL: error:14090086:SSL routines:ssl3_get_server_c...led | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS_ERROR: BIO read tls_read_plaintext error | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS Error: TLS object -> incoming plaintext read error | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS Error: TLS handshake failed | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: SIGUSR1[soft,tls-error] received, process restarting | |||
Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: Restart pause, 300 second(s) | |||
Mar 01 05:29:33 opensourceecology.org systemd[1]: [/etc/systemd/system/openvpn-client.service:2] Failed to re...cess | |||
Hint: Some lines were ellipsized, use -l to show in full. | |||
[root@opensourceecology rules]# | |||
</pre> | |||
# I sent an email to Marcin, and I'll let that sit overnight and see what the postfix charts look like in the morning | |||
<pre> | |||
I logged into the server and disabled the openvpn-client service (which was responsible for establishing the connection between your osedev, osestaging, and oseprod servers) and added a rule to your ossec config to stop it from sending an email every time this connect.sh script has issues. | |||
More details here: | |||
* https://wiki.opensourceecology.org/wiki/Maltfield_Log/2024 | |||
I'll check it in the morning and if the server doesn't have issues for a week, I guess we can call it fixed. | |||
PS: Your server had been running without a reboot for ~1,400 days before you rebooted it. That's a well-oiled machine. | |||
Cheers, | |||
Michael Altfield | |||
Senior Technology Advisor | |||
PGP Fingerprint: 8A4B 0AF8 162F 3B6A 79B7 70D2 AA3E DF71 60E2 D97B | |||
Open Source Ecology | |||
www.opensourceecology.org | |||
</pre> | |||
=Sat January 10, 2024= | =Sat January 10, 2024= |
Revision as of 05:39, 1 March 2024
My work log from the year 2024. I intentionally made this verbose to make future admin's work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.
See Also
Thr February 29, 2024
- Marcin alerted me on Feb 24 that the wiki was down
- I did a quick test and the frontpage loaded fine. The status page also didn't indicate any downtime
- however, when I tried to do a search on the wiki, it stalled for a while and eventually I got a varnish error saying it couldn't reach the backend
- I tried to check munin, but I got the same varnish error
- I told Marcin that he could try to just give it a reboot from the hetzner WUI, which he did
- after it came back, I checked Munin and saw that the db was down. I also saw a huge spike in apache memory usage
- Over the next few days, I noticed that OSSEC has been spamming me with emails once every 5 minutes
OSSEC HIDS Notification. 2024 Feb 27 23:12:07 Received From: opensourceecology->/var/log/messages Rule: 1002 fired (level 2) -> "Unknown problem somewhere in the system." Portion of the log(s): Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483 --END OF NOTIFICATION
- I'm not certain, but the email might be caused by some OpenVPN cron used to connect the prod & staging servers. It's possible that it expired and that the expiration is causing some loop that's bogging down the server somehow
- Every time Marcin reboots the server, I get an email alert from hetzner
- I noticed that Marcin rebooted it again today (Feb 29) at 10:03 (EST). I sent him an email and he confirmed this
- I also see he did another reboot today at 17:58
- I ssh'd into the server and checked /var/log/messages. Here's one of the snippets
Mar 1 03:38:24 opensourceecology connect.sh: TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194 Mar 1 03:38:24 opensourceecology connect.sh: Socket Buffers: R=[212992->212992] S=[212992->212992] Mar 1 03:38:24 opensourceecology connect.sh: UDP link local: (not bound) Mar 1 03:38:24 opensourceecology connect.sh: UDP link remote: [AF_INET]195.201.233.113:1194 Mar 1 03:38:24 opensourceecology connect.sh: TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=852b79d5 cd86dfc4 Mar 1 03:38:24 opensourceecology connect.sh: VERIFY OK: depth=1, CN=osedev1 Mar 1 03:38:24 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483 Mar 1 03:38:24 opensourceecology connect.sh: OpenSSL: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed Mar 1 03:38:24 opensourceecology connect.sh: TLS_ERROR: BIO read tls_read_plaintext error Mar 1 03:38:24 opensourceecology connect.sh: TLS Error: TLS object -> incoming plaintext read error Mar 1 03:38:24 opensourceecology connect.sh: TLS Error: TLS handshake failed Mar 1 03:38:24 opensourceecology connect.sh: SIGUSR1[soft,tls-error] received, process restarting Mar 1 03:38:24 opensourceecology connect.sh: Restart pause, 300 second(s)
- indeed, the line above what was emailed from OSSEC says
osedev
, so this appears to be related to the staging/dev setup - running `systemctl` shows that the service
openvpn-client.service
is in thefailed
state
- indeed, the line above what was emailed from OSSEC says
[root@opensourceecology cron.d]# systemctl ... ● openvpn-client.service loaded failed failed openvpn-client.service
- I found the config in /etc/systemd/system
[root@opensourceecology ~]# cd /etc/systemd/system [root@opensourceecology system]# ls basic.target.wants getty.target.wants nginx.service.d sysinit.target.wants default.target local-fs.target.wants openvpn-client.service system-update.target.wants default.target.wants multi-user.target.wants sockets.target.wants [root@opensourceecology system]# cat openvpn-client.service [Unit] Description=OpenVPN tunnel for %I After=syslog.target network-online.target Wants=network-online.target Documentation=man:openvpn(8) Documentation=https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage Documentation=https://community.openvpn.net/openvpn/wiki/HOWTO [Service] User=root Type=notify PrivateTmp=true WorkingDirectory=/etc/openvpn/client #WorkingDirectory=/root/openvpn #ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf ExecStart=/etc/openvpn/client/connect.sh CapabilityBoundingSet=CAP_IPC_LOCK CAP_NET_ADMIN CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SYS_CHROOT CAP_DAC_OVERRIDE #LimitNPROC=10 LimitNPROC=infinity DeviceAllow=/dev/null rw DeviceAllow=/dev/net/tun rw ProtectSystem=true ProtectHome=true KillMode=process [Install] WantedBy=multi-user.target [root@opensourceecology system]#
- from above, it looks like the path to the connect.sh script is
/etc/openvpn/client/connect.sh
[root@opensourceecology system]# cd /etc/openvpn/client/ [root@opensourceecology client]# ls auth.txt ca.crt client.conf connect.sh hetzner2.crt hetzner2.key ta.key [root@opensourceecology client]# cat connect.sh #!/bin/bash # yes, storing 2fa secret keys here doesn't add security; we're doing it only # because we can't exclude 2fa on a client-by-client basis in the openvpn # free server. 2fa is important for humans with bad passwords, not so much for # our server accounts. If someone can read this file, we've bigger problems. TOTP_SECRET=OBFUSCATED USERNAME=OBFUSCATED token=`oathtool --base32 --totp ${TOTP_SECRET}` echo -e "${USERNAME}\n${token}" > auth.txt /usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf [root@opensourceecology client]#
- If I manually run the openvpn command from the above script, I get the error
[root@opensourceecology client]# /usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf OpenVPN 2.4.12 x86_64-redhat-linux-gnu [Fedora EPEL patched] [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Mar 17 2022 library versions: OpenSSL 1.0.2k-fips 26 Jan 2017, LZO 2.06 WARNING: Your certificate has expired! Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194 Socket Buffers: R=[212992->212992] S=[212992->212992] UDP link local: (not bound) UDP link remote: [AF_INET]195.201.233.113:1194 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=11bab8ae 02ebcb41 VERIFY OK: depth=1, CN=osedev1 VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483 OpenSSL: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed TLS_ERROR: BIO read tls_read_plaintext error TLS Error: TLS object -> incoming plaintext read error TLS Error: TLS handshake failed SIGUSR1[soft,tls-error] received, process restarting Restart pause, 5 second(s) ^CSIGINT[hard,init_instance] received, process exiting [root@opensourceecology client]#
- that's enlightening, but I'm not sure I understand why this would crash the system
- I checked munin
- again, the db queries suddenly drops to 0 in 3 places, and we can clearly see when the server is broken
- cooresponding with all 3 gaps in the db chart, there's a sawtooth vertical climb in httpd rss usage. interesting.
- there's also a huge vertical spike in number of processes at these 3 times
- uptime was 1,400 days before we rebooted :whistles:
- but that's the point: this was a very well-oiled machine. why is running a "connect" script over-and-over bringing the whole thing to its knees? could that really be it?
- there is a huge spike in postfix mail queue. so it's possible that it's actually openvpn that's triggering ossec that's triggering postfix and that's causing issues
- honestly, nobody is using osedev or osestaging. I could renew the certs, but I'm not sure that's warranted atm
- instead, I'll do two things: [1] I'll add an ossec rule not to email alerts for this and [2] I'll disable the openvpn-client unit in systemd
- first we test the current behaviour of ossec with this log entry
[root@opensourceecology ~]# /var/ossec/bin/ossec-logtest 2024/03/01 04:58:05 ossec-testrule: INFO: Reading local decoder file. 2024/03/01 04:58:05 ossec-testrule: INFO: Started (pid: 12113). ossec-testrule: Type one log per line. Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483 **Phase 1: Completed pre-decoding. full event: 'Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' hostname: 'opensourceecology' program_name: 'connect.sh' log: 'VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' **Phase 2: Completed decoding. No decoder matched. **Phase 3: Completed filtering (rules). Rule id: '1002' Level: '2' Description: 'Unknown problem somewhere in the system.' **Alert to be generated.
- next we make a backup of the local rules file
[root@opensourceecology client]# cd /var/ossec/rules/ [root@opensourceecology rules]# ls apache_rules.xml hordeimp_rules.xml nginx_rules.xml rules_config.xml trend-osce_rules.xml apparmor_rules.xml ids_rules.xml openbsd_rules.xml sendmail_rules.xml unbound_rules.xml arpwatch_rules.xml imapd_rules.xml opensmtpd_rules.xml smbd_rules.xml vmpop3d_rules.xml asterisk_rules.xml local_rules.xml ossec_rules.xml solaris_bsm_rules.xml vmware_rules.xml attack_rules.xml mailscanner_rules.xml pam_rules.xml sonicwall_rules.xml vpn_concentrator_rules.xml cimserver_rules.xml mcafee_av_rules.xml php_rules.xml spamd_rules.xml vpopmail_rules.xml cisco-ios_rules.xml msauth_rules.xml pix_rules.xml squid_rules.xml vsftpd_rules.xml clam_av_rules.xml ms_dhcp_rules.xml policy_rules.xml sshd_rules.xml web_appsec_rules.xml courier_rules.xml ms-exchange_rules.xml postfix_rules.xml symantec-av_rules.xml web_rules.xml dovecot_rules.xml ms_ftpd_rules.xml postgresql_rules.xml symantec-ws_rules.xml wordpress_rules.xml dropbear_rules.xml ms-se_rules.xml proftpd_rules.xml syslog_rules.xml zeus_rules.xml firewalld_rules.xml mysql_rules.xml pure-ftpd_rules.xml sysmon_rules.xml firewall_rules.xml named_rules.xml racoon_rules.xml systemd_rules.xml ftpd_rules.xml netscreenfw_rules.xml roundcube_rules.xml telnetd_rules.xml [root@opensourceecology rules]# cp local_rules.xml local_rules.xml.20240229
- I added a new rule to catch this and stop emailing on them
[root@opensourceecology rules]# diff local_rules.xml.20240229 local_rules.xml 93a94,99 > <rule id="100057" level="2"> > <if_sid>1002</if_sid> > <match>error=certificate has expired</match> > <options>no_email_alert</options> > <description>2024-02-29: Ignore OpenVPN cert errors that spam us every 5 minutes and crash the server</description> > </rule> [root@opensourceecology rules]#
- I restarted ossec and confirmed it matches to the new rule. For some reason it still says "Alert to be generated", which I don't understand why. The rule 100057 clearly has "no_email_alert" set
[root@opensourceecology ~]# systemctl restart ossec [root@opensourceecology ~]# /var/ossec/bin/ossec-logtest 2024/03/01 05:07:33 ossec-testrule: INFO: Reading local decoder file. 2024/03/01 05:07:33 ossec-testrule: INFO: Started (pid: 15844). ossec-testrule: Type one log per line. Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483 **Phase 1: Completed pre-decoding. full event: 'Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' hostname: 'opensourceecology' program_name: 'connect.sh' log: 'VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' **Phase 2: Completed decoding. No decoder matched. **Phase 3: Completed filtering (rules). Rule id: '100057' Level: '2' Description: '2024-02-29: Ignore OpenVPN cert errors that spam us every 5 minutes and crash the server' **Alert to be generated.
- ah crap, I just realized that there's tons of different lines that are triggering these emails
Feb 29 08:18:02 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=1780983422210014075884459813273922948
Feb 29 08:18:02 opensourceecology connect.sh: OpenSSL: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed
Feb 29 08:18:02 opensourceecology connect.sh: TLS_ERROR: BIO read tls_read_plaintext error
Feb 29 08:18:02 opensourceecology connect.sh: TLS Error: TLS object -> incoming plaintext read error
Feb 29 08:18:02 opensourceecology connect.sh: TLS Error: TLS handshake failed
Feb 29 08:18:02 opensourceecology connect.sh: SIGUSR1[soft,tls-error] received, process restarting
- well, the one thing all these lines have in common is that they contain
opensourceecology connect.sh:
, so I'll match on that instead
[root@opensourceecology rules]# diff local_rules.xml.20240229 local_rules.xml 93a94,99 > <rule id="100057" level="2"> > <if_sid>1002</if_sid> > <match>opensourceecology connect.sh: </match> > <options>no_email_alert</options> > <description>2024-02-29: Ignore OpenVPN cert errors that spam us every 5 minutes and crash the server</description> > </rule> [root@opensourceecology rules]#
- and a test
[root@opensourceecology ~]# systemctl restart ossec [root@opensourceecology ~]# /var/ossec/bin/ossec-logtest 2024/03/01 05:14:57 ossec-testrule: INFO: Reading local decoder file. 2024/03/01 05:14:57 ossec-testrule: INFO: Started (pid: 17861). ossec-testrule: Type one log per line. Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483 **Phase 1: Completed pre-decoding. full event: 'Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' hostname: 'opensourceecology' program_name: 'connect.sh' log: 'VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' **Phase 2: Completed decoding. No decoder matched. **Phase 3: Completed filtering (rules). Rule id: '1002' Level: '2' Description: 'Unknown problem somewhere in the system.' **Alert to be generated.
- shit, that didn't work. probably regex syntax on one of the special characters
- ah, the decoder extracted the two fields as
hostname
andprogram_name
, and it appears that<match>
only applies to thelog
portion of the message. this works
[root@opensourceecology rules]# diff local_rules.xml.20240229 local_rules.xml 93a94,100 > <rule id="100057" level="2"> > <if_sid>1002</if_sid> > <hostname>opensourceecology</hostname> > <program_name>connect.sh</program_name> > <options>no_email_alert</options> > <description>2024-02-29: Ignore OpenVPN cert errors that spam us every 5 minutes and crash the server</description> > </rule> [root@opensourceecology rules]#
- but for some reason it still says
Alert will be generated
- but for some reason it still says
[root@opensourceecology ~]# systemctl restart ossec [root@opensourceecology ~]# /var/ossec/bin/ossec-logtest 2024/03/01 05:24:57 ossec-testrule: INFO: Reading local decoder file. 2024/03/01 05:24:57 ossec-testrule: INFO: Started (pid: 22484). ossec-testrule: Type one log per line. Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483 **Phase 1: Completed pre-decoding. full event: 'Feb 27 23:12:07 opensourceecology connect.sh: VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' hostname: 'opensourceecology' program_name: 'connect.sh' log: 'VERIFY ERROR: depth=0, error=certificate has expired: CN=server, serial=17809834222100140758844598132739229483' **Phase 2: Completed decoding. No decoder matched. **Phase 3: Completed filtering (rules). Rule id: '100057' Level: '2' Description: '2024-02-29: Ignore OpenVPN cert errors that spam us every 5 minutes and crash the server' **Alert to be generated.
- I also went ahead and stopped & disabled the openvpn-client service in systemd
[root@opensourceecology rules]# systemctl status openvpn-client ● openvpn-client.service Loaded: loaded (/etc/systemd/system/openvpn-client.service; enabled; vendor preset: disabled) Active: failed (Result: timeout) since Thu 2024-02-29 22:34:17 UTC; 6h ago Docs: man:openvpn(8) https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage https://community.openvpn.net/openvpn/wiki/HOWTO Process: 1110 ExecStart=/etc/openvpn/client/connect.sh (code=killed, signal=TERM) Main PID: 1110 (code=killed, signal=TERM) CGroup: /system.slice/openvpn-client.service └─1171 /usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: UDP link remote: [AF_INET]195.201.233.113:1194 Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS: Initial packet from [AF_INET]195.201.233.113:1194...453 Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: VERIFY OK: depth=1, CN=osedev1 Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: VERIFY ERROR: depth=0, error=certificate has expired: ...483 Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: OpenSSL: error:14090086:SSL routines:ssl3_get_server_c...led Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS_ERROR: BIO read tls_read_plaintext error Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS Error: TLS object -> incoming plaintext read error Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS Error: TLS handshake failed Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: SIGUSR1[soft,tls-error] received, process restarting Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: Restart pause, 300 second(s) Hint: Some lines were ellipsized, use -l to show in full. [root@opensourceecology rules]# systemctl stop openvpn-client [root@opensourceecology rules]# systemctl disable openvpn-client Removed symlink /etc/systemd/system/multi-user.target.wants/openvpn-client.service. [root@opensourceecology rules]# systemctl status openvpn-client ● openvpn-client.service Loaded: loaded (/etc/systemd/system/openvpn-client.service; disabled; vendor preset: disabled) Active: failed (Result: timeout) since Thu 2024-02-29 22:34:17 UTC; 6h ago Docs: man:openvpn(8) https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage https://community.openvpn.net/openvpn/wiki/HOWTO Main PID: 1110 (code=killed, signal=TERM) CGroup: /system.slice/openvpn-client.service └─1171 /usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS: Initial packet from [AF_INET]195.201.233.113:1194...453 Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: VERIFY OK: depth=1, CN=osedev1 Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: VERIFY ERROR: depth=0, error=certificate has expired: ...483 Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: OpenSSL: error:14090086:SSL routines:ssl3_get_server_c...led Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS_ERROR: BIO read tls_read_plaintext error Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS Error: TLS object -> incoming plaintext read error Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: TLS Error: TLS handshake failed Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: SIGUSR1[soft,tls-error] received, process restarting Mar 01 05:28:26 opensourceecology.org connect.sh[1110]: Restart pause, 300 second(s) Mar 01 05:29:33 opensourceecology.org systemd[1]: [/etc/systemd/system/openvpn-client.service:2] Failed to re...cess Hint: Some lines were ellipsized, use -l to show in full. [root@opensourceecology rules]#
- I sent an email to Marcin, and I'll let that sit overnight and see what the postfix charts look like in the morning
I logged into the server and disabled the openvpn-client service (which was responsible for establishing the connection between your osedev, osestaging, and oseprod servers) and added a rule to your ossec config to stop it from sending an email every time this connect.sh script has issues. More details here: * https://wiki.opensourceecology.org/wiki/Maltfield_Log/2024 I'll check it in the morning and if the server doesn't have issues for a week, I guess we can call it fixed. PS: Your server had been running without a reboot for ~1,400 days before you rebooted it. That's a well-oiled machine. Cheers, Michael Altfield Senior Technology Advisor PGP Fingerprint: 8A4B 0AF8 162F 3B6A 79B7 70D2 AA3E DF71 60E2 D97B Open Source Ecology www.opensourceecology.org
Sat January 10, 2024
- Marcin emailed me yesterday (2024-01-19) asking about cert issues with www.opensourceecology.org
- I confirmed that wiki.opensourceecology.org is fine, but www.opensourceecology.org displays a cert issue in firefox
- checking the cert details in firefox, I see that the following SANs are available. Note that the naked domain and 'www' subdomain are absent
awstats.opensourceecology.org, fef.opensourceecology.org, forum.opensourceecology.org, microfactory.opensourceecology.org, munin.opensourceecology.org, opensourceecology.org, oswh.opensourceecology.org, phplist.opensourceecology.org, staging.opensourceecology.org, store.opensourceecology.org, wiki.opensourceecology.org
- looks like there's a cron named 'letsencrypt'
[root@opensourceecology cron.d]# ls 0hourly awstats_generate_static_files backup_to_backblaze cacti letsencrypt munin phplist raid-check sysstat [root@opensourceecology cron.d]#
- the log file is /var/log/letsEncryptRenew.log
# once a month, update our letsencrypt cert 20 4 13 * * root /root/bin/letsencrypt/renew.sh &>> /var/log/letsEncryptRenew.log [root@opensourceecology cron.d]#
- the very bottom of the log file contains this message
IMPORTANT NOTES: - The following errors were reported by the server: Domain: www.opensourceecology.org Type: dns Detail: DNS problem: networking error looking up CAA for www.opensourceecology.org Redirecting to /bin/systemctl reload nginx.service
- I attempted a run now
[root@opensourceecology cron.d]# certbot renew Saving debug log to /var/log/letsencrypt/letsencrypt.log - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Processing /etc/letsencrypt/renewal/openbuildinginstitute.org.conf - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Cert not yet due for renewal - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Processing /etc/letsencrypt/renewal/opensourceecology.org.conf - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Cert not yet due for renewal - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - The following certs are not due for renewal yet: /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem expires on 2024-04-12 (skipped) /etc/letsencrypt/live/opensourceecology.org/fullchain.pem expires on 2024-04-12 (skipped) No renewals were attempted. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [root@opensourceecology cron.d]#
- checking the existing cert info, it shows naked domain in the list, but somehow the the 'www' subdomian is missing
[root@opensourceecology cron.d]# certbot certificates Saving debug log to /var/log/letsencrypt/letsencrypt.log - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Found the following certs: Certificate Name: openbuildinginstitute.org Domains: www.openbuildinginstitute.org awstats.openbuildinginstitute.org openbuildinginstitute.org seedhome.openbuildinginstitute.org Expiry Date: 2024-04-12 03:24:05+00:00 (VALID: 82 days) Certificate Path: /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem Private Key Path: /etc/letsencrypt/live/openbuildinginstitute.org/privkey.pem Certificate Name: opensourceecology.org Domains: opensourceecology.org awstats.opensourceecology.org fef.opensourceecology.org forum.opensourceecology.org microfactory.opensourceecology.org munin.opensourceecology.org oswh.opensourceecology.org phplist.opensourceecology.org staging.opensourceecology.org store.opensourceecology.org wiki.opensourceecology.org Expiry Date: 2024-04-12 03:24:18+00:00 (VALID: 82 days) Certificate Path: /etc/letsencrypt/live/opensourceecology.org/fullchain.pem Private Key Path: /etc/letsencrypt/live/opensourceecology.org/privkey.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [root@opensourceecology cron.d]
- I see the 'www' in the config file
[root@opensourceecology cron.d]# cd /etc/letsencrypt [root@opensourceecology cron.d]# [root@opensourceecology letsencrypt]# grep -irl 'wiki.opensourceecology.org' * renewal/opensourceecology.org.conf [root@opensourceecology letsencrypt]# cat renewal/opensourceecology.org.conf # renew_before_expiry = 30 days version = 1.3.0 archive_dir = /etc/letsencrypt/archive/opensourceecology.org cert = /etc/letsencrypt/live/opensourceecology.org/cert.pem privkey = /etc/letsencrypt/live/opensourceecology.org/privkey.pem chain = /etc/letsencrypt/live/opensourceecology.org/chain.pem fullchain = /etc/letsencrypt/live/opensourceecology.org/fullchain.pem # Options used in the renewal process [renewalparams] authenticator = webroot account = OBFUSCATED server = OBFUSCATED allow_subset_of_names = True webroot_map store.opensourceecology.org = /var/www/html/store.opensourceecology.org/htdocs phplist.opensourceecology.org = /var/www/html/phplist.opensourceecology.org/public_html www.opensourceecology.org = /var/www/html/www.opensourceecology.org/htdocs munin.opensourceecology.org = /var/www/html/certbot/htdocs microfactory.opensourceecology.org = /var/www/html/microfactory.opensourceecology.org/htdocs opensourceecology.org = /var/www/html/www.opensourceecology.org/htdocs wiki.opensourceecology.org = /var/www/html/wiki.opensourceecology.org/htdocs fef.opensourceecology.org = /var/www/html/fef.opensourceecology.org/htdocs awstats.opensourceecology.org = /var/www/html/certbot/htdocs forum.opensourceecology.org = /var/www/html/forum.opensourceecology.org/htdocs staging.opensourceecology.org = /var/www/html/staging.opensourceecology.org/htdocs oswh.opensourceecology.org = /var/www/html/oswh.opensourceecology.org/htdocs [root@opensourceecology letsencrypt]#
- I triggered a manual force-renewal
[root@opensourceecology letsencrypt]# certbot renew --force-renewal Saving debug log to /var/log/letsencrypt/letsencrypt.log - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Processing /etc/letsencrypt/renewal/openbuildinginstitute.org.conf - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Plugins selected: Authenticator webroot, Installer None Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org Renewing an existing certificate - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - new certificate deployed without reload, fullchain is /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Processing /etc/letsencrypt/renewal/opensourceecology.org.conf - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Plugins selected: Authenticator webroot, Installer None Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org Renewing an existing certificate - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - new certificate deployed without reload, fullchain is /etc/letsencrypt/live/opensourceecology.org/fullchain.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Congratulations, all renewals succeeded. The following certs have been renewed: /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem (success) /etc/letsencrypt/live/opensourceecology.org/fullchain.pem (success) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [root@opensourceecology letsencrypt]#
- still the 'www' subdomain is absent
[root@opensourceecology letsencrypt]# certbot certificates Saving debug log to /var/log/letsencrypt/letsencrypt.log - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Found the following certs: Certificate Name: openbuildinginstitute.org Domains: www.openbuildinginstitute.org awstats.openbuildinginstitute.org openbuildinginstitute.org seedhome.openbuildinginstitute.org Expiry Date: 2024-04-20 00:47:41+00:00 (VALID: 89 days) Certificate Path: /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem Private Key Path: /etc/letsencrypt/live/openbuildinginstitute.org/privkey.pem Certificate Name: opensourceecology.org Domains: opensourceecology.org awstats.opensourceecology.org fef.opensourceecology.org forum.opensourceecology.org microfactory.opensourceecology.org munin.opensourceecology.org oswh.opensourceecology.org phplist.opensourceecology.org staging.opensourceecology.org store.opensourceecology.org wiki.opensourceecology.org Expiry Date: 2024-04-20 00:47:47+00:00 (VALID: 89 days) Certificate Path: /etc/letsencrypt/live/opensourceecology.org/fullchain.pem Private Key Path: /etc/letsencrypt/live/opensourceecology.org/privkey.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [root@opensourceecology letsencrypt]#
- I reloaded the site in the browser; the issue persists
- I checked the cert; it says it was issued today (actually 2024-01-21, which is "today" in UTC but "tomorrow" in my timezone). Anyway, that shows that the cert did, in fact, renew. But it still is missing the subdomain that it is *supposed* to fetch as a SAN, according to the config above. Huh.
- I sent an email to Marcin asking if any changes were made recently
- I checked crt.sh (the cert observatory) to see the history of our certs for the domain opensourceecology.org https://crt.sh/?q=opensourceecology.org
- It looks like we've been using the 'www.opensourceecology.org SAN in our cert since 2018-03-01 (prior to that it was cloudflare wildcard), and it suddenly disappeared in the 2014-01-13 renewal. It was present in the one before that, issued 2023-11-13.
- therefore, this issue occurred sometime between 2023-11-13 and 2024-01-13.
- unfortunately, the letsencrypt logs are not timestampped
- the only time YYYY appears is the date of the certs expiry
# here's the log from the last renewal (that worked; note the line that lists www.opensourceecology.org) <pre> Saving debug log to /var/log/letsencrypt/letsencrypt.log - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Processing /etc/letsencrypt/renewal/openbuildinginstitute.org.conf - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Cert is due for renewal, auto-renewing... Non-interactive renewal: random delay of 118.029492405 seconds Plugins selected: Authenticator webroot, Installer None Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org Renewing an existing certificate Performing the following challenges: http-01 challenge for awstats.openbuildinginstitute.org http-01 challenge for openbuildinginstitute.org http-01 challenge for seedhome.openbuildinginstitute.org http-01 challenge for www.openbuildinginstitute.org Waiting for verification... Cleaning up challenges - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - new certificate deployed without reload, fullchain is /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Processing /etc/letsencrypt/renewal/opensourceecology.org.conf - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Cert is due for renewal, auto-renewing... Plugins selected: Authenticator webroot, Installer None Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org Renewing an existing certificate Performing the following challenges: http-01 challenge for awstats.opensourceecology.org http-01 challenge for fef.opensourceecology.org http-01 challenge for forum.opensourceecology.org http-01 challenge for microfactory.opensourceecology.org http-01 challenge for munin.opensourceecology.org http-01 challenge for opensourceecology.org http-01 challenge for oswh.opensourceecology.org http-01 challenge for phplist.opensourceecology.org http-01 challenge for staging.opensourceecology.org http-01 challenge for store.opensourceecology.org http-01 challenge for wiki.opensourceecology.org http-01 challenge for www.opensourceecology.org Waiting for verification... Cleaning up challenges - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - new certificate deployed without reload, fullchain is /etc/letsencrypt/live/opensourceecology.org/fullchain.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Congratulations, all renewals succeeded. The following certs have been renewed: /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem (success) /etc/letsencrypt/live/opensourceecology.org/fullchain.pem (success) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Redirecting to /bin/systemctl reload nginx.service
- there's one entry after that where it said the certs aren't up for renewal; I'll skip that
- the entry after that is where it renewed, but didn't fetch www.opensourceecology.org due to some (temp?) DNS network issue
Saving debug log to /var/log/letsencrypt/letsencrypt.log - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Processing /etc/letsencrypt/renewal/openbuildinginstitute.org.conf - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Cert is due for renewal, auto-renewing... Non-interactive renewal: random delay of 236.7908044 seconds Plugins selected: Authenticator webroot, Installer None Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org Renewing an existing certificate Performing the following challenges: http-01 challenge for awstats.openbuildinginstitute.org http-01 challenge for openbuildinginstitute.org http-01 challenge for seedhome.openbuildinginstitute.org http-01 challenge for www.openbuildinginstitute.org Waiting for verification... Cleaning up challenges - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - new certificate deployed without reload, fullchain is /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Processing /etc/letsencrypt/renewal/opensourceecology.org.conf - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Cert is due for renewal, auto-renewing... Plugins selected: Authenticator webroot, Installer None Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org Renewing an existing certificate Performing the following challenges: http-01 challenge for awstats.opensourceecology.org http-01 challenge for fef.opensourceecology.org http-01 challenge for forum.opensourceecology.org http-01 challenge for microfactory.opensourceecology.org http-01 challenge for munin.opensourceecology.org http-01 challenge for opensourceecology.org http-01 challenge for oswh.opensourceecology.org http-01 challenge for phplist.opensourceecology.org http-01 challenge for staging.opensourceecology.org http-01 challenge for store.opensourceecology.org http-01 challenge for wiki.opensourceecology.org http-01 challenge for www.opensourceecology.org Waiting for verification... Challenge failed for domain www.opensourceecology.org http-01 challenge for www.opensourceecology.org Cleaning up challenges - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - new certificate deployed without reload, fullchain is /etc/letsencrypt/live/opensourceecology.org/fullchain.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Congratulations, all renewals succeeded. The following certs have been renewed: /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem (success) /etc/letsencrypt/live/opensourceecology.org/fullchain.pem (success) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - IMPORTANT NOTES: - The following errors were reported by the server: Domain: www.opensourceecology.org Type: dns Detail: DNS problem: networking error looking up CAA for www.opensourceecology.org Redirecting to /bin/systemctl reload nginx.service
- there's a recent thread with this error message on the Let's Encrypt fourms. It was asked 2023-12-20, which is right in our timeline of when this issue hit OSE https://community.letsencrypt.org/t/networking-error-when-letsencrypt-is-looking-up-caa/210483
- there's also a recent bug report filed last week (2024-01-15) https://github.com/cert-manager/cert-manager/issues/6640
- none of the above two threads were helpful. I'm just going to try to upgrade cerbot; maybe there's some old bug that was fixed
- currently we're running certbot v1.3.0-1
[root@opensourceecology letsencrypt]# rpm -qa | grep -i certbot python2-certbot-1.3.0-1.el7.noarch certbot-1.3.0-1.el7.noarch [root@opensourceecology letsencrypt]#
- I upgraded to v1.11.0-2
[root@opensourceecology letsencrypt]# yum install certbot Loaded plugins: fastestmirror, replace Loading mirror speeds from cached hostfile * base: mirror.fra1.de.leaseweb.net * epel: mirror.de.leaseweb.net * extras: mirror.checkdomain.de * updates: mirror.checkdomain.de Resolving Dependencies --> Running transaction check ---> Package certbot.noarch 0:1.3.0-1.el7 will be updated ---> Package certbot.noarch 0:1.11.0-2.el7 will be an update --> Processing Dependency: python2-certbot = 1.11.0-2.el7 for package: certbot-1.11.0-2.el7.noarch --> Running transaction check ---> Package python2-certbot.noarch 0:1.3.0-1.el7 will be updated ---> Package python2-certbot.noarch 0:1.11.0-2.el7 will be an update --> Processing Dependency: python2-acme >= 1.8.0 for package: python2-certbot-1.11.0-2.el7.noarch --> Running transaction check ---> Package python2-acme.noarch 0:1.3.0-1.el7 will be updated ---> Package python2-acme.noarch 0:1.11.0-1.el7 will be an update --> Finished Dependency Resolution Dependencies Resolved ================================================================================================================================== Package Arch Version Repository Size ================================================================================================================================== Updating: certbot noarch 1.11.0-2.el7 epel 47 k Updating for dependencies: python2-acme noarch 1.11.0-1.el7 epel 83 k python2-certbot noarch 1.11.0-2.el7 epel 386 k Transaction Summary ================================================================================================================================== Upgrade 1 Package (+2 Dependent packages) Total download size: 515 k Is this ok [y/d/N]: y Downloading packages: Delta RPMs disabled because /usr/bin/applydeltarpm not installed. (1/3): certbot-1.11.0-2.el7.noarch.rpm | 47 kB 00:00:00 (2/3): python2-acme-1.11.0-1.el7.noarch.rpm | 83 kB 00:00:00 (3/3): python2-certbot-1.11.0-2.el7.noarch.rpm | 386 kB 00:00:00 -------------------- Total 1.9 MB/s | 515 kB 00:00:00 Running transaction check Running transaction test Transaction test succeeded Running transaction Updating : python2-acme-1.11.0-1.el7.noarch 1/6 Updating : python2-certbot-1.11.0-2.el7.noarch 2/6 Updating : certbot-1.11.0-2.el7.noarch 3/6 Cleanup : certbot-1.3.0-1.el7.noarch 4/6 Cleanup : python2-certbot-1.3.0-1.el7.noarch 5/6 Cleanup : python2-acme-1.3.0-1.el7.noarch 6/6 Verifying : certbot-1.11.0-2.el7.noarch 1/6 Verifying : python2-acme-1.11.0-1.el7.noarch 2/6 Verifying : python2-certbot-1.11.0-2.el7.noarch 3/6 Verifying : python2-certbot-1.3.0-1.el7.noarch 4/6 Verifying : certbot-1.3.0-1.el7.noarch 5/6 Verifying : python2-acme-1.3.0-1.el7.noarch 6/6 Updated: certbot.noarch 0:1.11.0-2.el7 Dependency Updated: python2-acme.noarch 0:1.11.0-1.el7 python2-certbot.noarch 0:1.11.0-2.el7 Complete! [root@opensourceecology letsencrypt]#
- I did another force renewal
[root@opensourceecology letsencrypt]# certbot certificates Saving debug log to /var/log/letsencrypt/letsencrypt.log - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Found the following certs: Certificate Name: openbuildinginstitute.org Serial Number: 47c22e806946a20e6a574394ef507d0aff2 Key Type: RSA Domains: www.openbuildinginstitute.org awstats.openbuildinginstitute.org openbuildinginstitute.org seedhome.openbuildinginstitute.org Expiry Date: 2024-04-20 00:47:41+00:00 (VALID: 89 days) Certificate Path: /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem Private Key Path: /etc/letsencrypt/live/openbuildinginstitute.org/privkey.pem Certificate Name: opensourceecology.org Serial Number: 4c27188b056dad08bff1b624995a3774a61 Key Type: RSA Domains: opensourceecology.org awstats.opensourceecology.org fef.opensourceecology.org forum.opensourceecology.org microfactory.opensourceecology.org munin.opensourceecology.org oswh.opensourceecology.org phplist.opensourceecology.org staging.opensourceecology.org store.opensourceecology.org wiki.opensourceecology.org Expiry Date: 2024-04-20 00:47:47+00:00 (VALID: 89 days) Certificate Path: /etc/letsencrypt/live/opensourceecology.org/fullchain.pem Private Key Path: /etc/letsencrypt/live/opensourceecology.org/privkey.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [root@opensourceecology letsencrypt]# [root@opensourceecology letsencrypt]# certbot renew --force-renew Saving debug log to /var/log/letsencrypt/letsencrypt.log - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Processing /etc/letsencrypt/renewal/openbuildinginstitute.org.conf - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Plugins selected: Authenticator webroot, Installer None Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org Renewing an existing certificate for www.openbuildinginstitute.org and 3 more domains - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - new certificate deployed without reload, fullchain is /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Processing /etc/letsencrypt/renewal/opensourceecology.org.conf - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Plugins selected: Authenticator webroot, Installer None Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org Renewing an existing certificate for opensourceecology.org and 10 more domains - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - new certificate deployed without reload, fullchain is /etc/letsencrypt/live/opensourceecology.org/fullchain.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Congratulations, all renewals succeeded: /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem (success) /etc/letsencrypt/live/opensourceecology.org/fullchain.pem (success) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [root@opensourceecology letsencrypt]#
- sadly, there's still no 'www' subdomain
[root@opensourceecology letsencrypt]# certbot certificates Saving debug log to /var/log/letsencrypt/letsencrypt.log - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Found the following certs: Certificate Name: openbuildinginstitute.org Serial Number: 3a18ce304580629bd0d83fe28d9285b492f Key Type: RSA Domains: www.openbuildinginstitute.org awstats.openbuildinginstitute.org openbuildinginstitute.org seedhome.openbuildinginstitute.org Expiry Date: 2024-04-20 01:17:10+00:00 (VALID: 89 days) Certificate Path: /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem Private Key Path: /etc/letsencrypt/live/openbuildinginstitute.org/privkey.pem Certificate Name: opensourceecology.org Serial Number: 4c559a2d15c6bea75c5e7bb65170fbe2d9c Key Type: RSA Domains: opensourceecology.org awstats.opensourceecology.org fef.opensourceecology.org forum.opensourceecology.org microfactory.opensourceecology.org munin.opensourceecology.org oswh.opensourceecology.org phplist.opensourceecology.org staging.opensourceecology.org store.opensourceecology.org wiki.opensourceecology.org Expiry Date: 2024-04-20 01:17:15+00:00 (VALID: 89 days) Certificate Path: /etc/letsencrypt/live/opensourceecology.org/fullchain.pem Private Key Path: /etc/letsencrypt/live/opensourceecology.org/privkey.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [root@opensourceecology letsencrypt]#
- I think I should try to just explicitly tell certbot to update its config. the process for this is documented in the wiki here https://wiki.opensourceecology.org/wiki/Web_server_configuration#https
- here's the command as-stated in the wiki (doc rot)
certbot -nv --expand --cert-name opensourceecology.org certonly -v --webroot -w /var/www/html/fef.opensourceecology.org/htdocs/ -d fef.opensourceecology.org -w /var/www/html/www.opensourceecology.org/htdocs -d www.opensourceecology.org -w /var/www/html/oswh.opensourceecology.org/htdocs/ -d oswh.opensourceecology.org -w /var/www/html/forum.opensourceecology.org/htdocs -d forum.opensourceecology.org /bin/chmod 0400 /etc/letsencrypt/archive/*/pri* nginx -t && service nginx reload
- here's the updated command
certbot -nv --expand --cert-name opensourceecology.org certonly -v --webroot -w /var/www/html/fef.opensourceecology.org/htdocs/ -d fef.opensourceecology.org -w /var/www/html/www.opensourceecology.org/htdocs -d www.opensourceecology.org -d opensourceecology.org -w /var/www/html/oswh.opensourceecology.org/htdocs/ -d oswh.opensourceecology.org -w /var/www/html/forum.opensourceecology.org/htdocs -d forum.opensourceecology.org -w /var/www/html/store.opensourceecology.org/htdocs -d store.opensourceecology.org -w /var/www/html/phplist.opensourceecology.org/public_html -d phplist.opensourceecology.org -w /var/www/html/certbot/htdocs -d munin.opensourceecology.org -d awstats.opensourceecology.org -w /var/www/html/microfactory.opensourceecology.org/htdocs -d microfactory.opensourceecology.org -w /var/www/html/wiki.opensourceecology.org/htdocs -d wiki.opensourceecology.org -w /var/www/html/staging.opensourceecology.org/htdocs -d staging.opensourceecology.org
- gave it a run
[root@opensourceecology letsencrypt]# certbot -nv --expand --cert-name opensourceecology.org certonly -v --webroot -w /var/www/html/fef.opensourceecology.org/htdocs/ -d fef.opensourceecology.org -w /var/www/html/www.opensourceecology.org/htdocs -d www.opensourceecology.org -d opensourceecology.org -w /var/www/html/oswh.opensourceecology.org/htdocs/ -d oswh.opensourceecology.org -w /var/www/html/forum.opensourceecology.org/htdocs -d forum.opensourceecology.org -w /var/www/html/store.opensourceecology.org/htdocs -d store.opensourceecology.org -w /var/www/html/phplist.opensourceecology.org/public_html -d phplist.opensourceecology.org -w /var/www/html/certbot/htdocs -d munin.opensourceecology.org -d awstats.opensourceecology.org -w /var/www/html/microfactory.opensourceecology.org/htdocs -d microfactory.opensourceecology.org -w /var/www/html/wiki.opensourceecology.org/htdocs -d wiki.opensourceecology.org -w /var/www/html/staging.opensourceecology.org/htdocs -d staging.opensourceecology.org ... Writing new config /etc/letsencrypt/renewal/opensourceecology.org.conf.new. Reporting to user: Congratulations! Your certificate and chain have been saved at: /etc/letsencrypt/live/opensourceecology.org/fullchain.pem Your key file has been saved at: /etc/letsencrypt/live/opensourceecology.org/privkey.pem Your certificate will expire on 2024-04-20. To obtain a new or tweaked version of this certificate in the future, simply run certb ot again. To non-interactively renew *all* of your certificates, run "certbot renew" Reporting to user: If you like Certbot, please consider supporting our work by: Donating to ISRG / Let's Encrypt: https://letsencrypt.org/donate Donating to EFF: https://eff.org/donate-le IMPORTANT NOTES: - Congratulations! Your certificate and chain have been saved at: /etc/letsencrypt/live/opensourceecology.org/fullchain.pem Your key file has been saved at: /etc/letsencrypt/live/opensourceecology.org/privkey.pem Your certificate will expire on 2024-04-20. To obtain a new or tweaked version of this certificate in the future, simply run certbot again. To non-interactively renew *all* of your certificates, run "certbot renew" - If you like Certbot, please consider supporting our work by: Donating to ISRG / Let's Encrypt: https://letsencrypt.org/donate Donating to EFF: https://eff.org/donate-le [root@opensourceecology letsencrypt]#
- finally, the 'www' subdomain is in the list now
[root@opensourceecology letsencrypt]# certbot certificates Saving debug log to /var/log/letsencrypt/letsencrypt.log - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Found the following certs: Certificate Name: openbuildinginstitute.org Serial Number: 3a18ce304580629bd0d83fe28d9285b492f Key Type: RSA Domains: www.openbuildinginstitute.org awstats.openbuildinginstitute.org openbuildinginstitute.org seedhome.openbuildinginstitute.org Expiry Date: 2024-04-20 01:17:10+00:00 (VALID: 89 days) Certificate Path: /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem Private Key Path: /etc/letsencrypt/live/openbuildinginstitute.org/privkey.pem Certificate Name: opensourceecology.org Serial Number: 30ec9a5c7a9a9603d3f2c5bd943be6d8d75 Key Type: RSA Domains: fef.opensourceecology.org awstats.opensourceecology.org forum.opensourceecology.org microfactory.opensourceecology.org munin.opensourceecology.org opensourceecology.org oswh.opensourceecology.org phplist.opensourceecology.org staging.opensourceecology.org store.opensourceecology.org wiki.opensourceecology.org www.opensourceecology.org Expiry Date: 2024-04-20 01:28:35+00:00 (VALID: 89 days) Certificate Path: /etc/letsencrypt/live/opensourceecology.org/fullchain.pem Private Key Path: /etc/letsencrypt/live/opensourceecology.org/privkey.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [root@opensourceecology letsencrypt]#
- refreshing the page doesn't work; let's try the next two documented commands
[root@opensourceecology letsencrypt]# /bin/chmod 0400 /etc/letsencrypt/archive/*/pri* [root@opensourceecology letsencrypt]# nginx -t && service nginx reload nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: the configuration file /etc/nginx/nginx.conf syntax is ok nginx: configuration file /etc/nginx/nginx.conf test is successful Redirecting to /bin/systemctl reload nginx.service [root@opensourceecology letsencrypt]#
- well looks like we're using some deprecated 'ssl' option in our nginx configs, but otherwise that looked good
- I refreshed the web browser, and now the cert errors are gone :D
- I tested all of the other subdomains; I did not get any errors
- for good measure (to make sure it won't break again on next renewal), I forced one more renewal
[root@opensourceecology letsencrypt]# certbot renew --force-renew Saving debug log to /var/log/letsencrypt/letsencrypt.log - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Processing /etc/letsencrypt/renewal/openbuildinginstitute.org.conf - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Plugins selected: Authenticator webroot, Installer None Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org Renewing an existing certificate for www.openbuildinginstitute.org and 3 more domains - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - new certificate deployed without reload, fullchain is /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Processing /etc/letsencrypt/renewal/opensourceecology.org.conf - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Plugins selected: Authenticator webroot, Installer None Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org Renewing an existing certificate for fef.opensourceecology.org and 11 more domains - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - new certificate deployed without reload, fullchain is /etc/letsencrypt/live/opensourceecology.org/fullchain.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Congratulations, all renewals succeeded: /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem (success) /etc/letsencrypt/live/opensourceecology.org/fullchain.pem (success) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [root@opensourceecology letsencrypt]#
- I confirmed that the 'www' subdomain is, in fact, still present. Success!
[root@opensourceecology letsencrypt]# certbot certificates Saving debug log to /var/log/letsencrypt/letsencrypt.log - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Found the following certs: Certificate Name: openbuildinginstitute.org Serial Number: 30f44a6ec177f0c7c24b8d450aa9e57ef36 Key Type: RSA Domains: www.openbuildinginstitute.org awstats.openbuildinginstitute.org openbuildinginstitute.org seedhome.openbuildinginstitute.org Expiry Date: 2024-04-20 01:35:00+00:00 (VALID: 89 days) Certificate Path: /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem Private Key Path: /etc/letsencrypt/live/openbuildinginstitute.org/privkey.pem Certificate Name: opensourceecology.org Serial Number: 360f3ea6ad9910ecaff89dc9725efd8f1a9 Key Type: RSA Domains: fef.opensourceecology.org awstats.opensourceecology.org forum.opensourceecology.org microfactory.opensourceecology.org munin.opensourceecology.org opensourceecology.org oswh.opensourceecology.org phplist.opensourceecology.org staging.opensourceecology.org store.opensourceecology.org wiki.opensourceecology.org www.opensourceecology.org Expiry Date: 2024-04-20 01:35:05+00:00 (VALID: 89 days) Certificate Path: /etc/letsencrypt/live/opensourceecology.org/fullchain.pem Private Key Path: /etc/letsencrypt/live/opensourceecology.org/privkey.pem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [root@opensourceecology letsencrypt]#
- In conclusion, it appears that there's some recent & known bug in certbot where DNS errors cause failures at some very small probability. This broke renewal for just one of our subdomains sometime between 2023-11-13 and 2024-01-13 https://github.com/cert-manager/cert-manager/issues/6640
- I think I may have also discovered some other unknown bug where ^ this bug causes certbot to forget about a defined subdomain on subsequent renewals. But I don't have enough evidence of this yet to submit a proper bug report
- the fix was to manually construct and re-run the `certbot -nv --expand --cert-name opensourceecology.org certonly -v --webroot -w ...` command && chmod && nginx reload (see above)
- I sent an email to Marcin about this root cause & resolution
Hey Marcin, I fixed the certbot config. It looks like there's some sort of bug (reported to Let's Encrypt GitHub last week) that causes some domain renewals to fail due to DNS issues at some highly improbable likelihood. This happened for the 'www.opensourceeoclogy.org' subdomain sometime between 2023-11-13 and 2024-01-13. And, for some reason, this prevented subsequent renewals from attempting to renew that subdomain. I fixed it by refreshing the config for the opensourceecology.org subdomain. I tested that subsequent renewals will work as well. For more information, see my log entry: * https://wiki.opensourceecology.org/wiki/Maltfield_Log/2024#Sat_January_10.2C_2024 Please let me know if you have any further questions. Thank you, Michael Altfield Senior Technology Advisor PGP Fingerprint: 8A4B 0AF8 162F 3B6A 79B7 70D2 AA3E DF71 60E2 D97B Open Source Ecology www.opensourceecology.org