Maltfield Log/2024 Q3

From Open Source Ecology
Jump to navigation Jump to search

My work log from the third quarter of the year 2024. I intentionally made this verbose to make future admin's work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.

See Also

  1. Maltfield_Log
  2. User:Maltfield
  3. Special:Contributions/Maltfield

Tue July 30, 2024

  1. Marcin gave me the go-ahead to order a "hetzner3" server and begin provisioning it with Debian in preparation to migrate all our sites from the CentOS7 hetzner2 server to this new server
  2. This is going to be an enormous project. When I did the hetzner1 -> hetzner2 migration, I inherited both systems (in 2017). For some reason the websites were split across both servers (plus dreamhost too iirc?). but I consolidated everything onto "hetzner2" and canceled "hetzner1" in 2018 https://wiki.opensourceecology.org/index.php?title=OSE_Server&oldid=298909#Assessment_of_Server_Options
  3. I'll be using ansible to assist in provisioning this server (and hopefully make it easier to provision future servers). Marcin expressed interest in lowering this barrier for others, as well.
    1. I noticed that 5 years ago I created a repo for OSE's ansible playbooks, but it's empty.
    2. I just added the LICENSE to this repo, and I plan to use it to publish our ansible roles/playbooks
  4. First thing I need to do is decide which server to buy from hetzner's dedicated serer offerings
    1. holy crap, not only are their server auctions *much* cheeper per month, they also don't have a one-time-setup fee (usually ~$50-$200?)
  5. I've written pretty extensively in the past about what specs I'd be looking to get in a future OSE Server migration https://wiki.opensourceecology.org/index.php?title=OSE_Server&oldid=298909#OSE_Server_and_Server_Requirements
    1. In 2018, I said we'd want min 2-4 cores
    2. In 2018, I said we'd want min 8-16 G RAM
    3. In 2018, I said we'd want min ~200G disk
  6. Honestly, I expect that the lowest offerings of a dedicated server in 2024 are probably going to suffice for us, but what I'm mostly concerned-about is the disk.
    1. even last week when I did the yum updates, I nearly filled the disk just by extracting a copy of our backups. Currently we have two 250G disks in a software RAID-1 (mirror) array. That give us a useable 197G
    2. it's also provisioned with all the data on '/'. It would be smart if we setup an LVM
    3. It's important to me that we double this at-least, but I'll see if there's any deals on 1TB disks or larger
    4. also what we currently have is a 6 Gb/s SSD, so I don't want to downgrade that by going to a spinning-disk HDD. NvME might be a welcome upgrade. I/O wait is probably a bottleneck, but not currently one that's causing us agony
  7. I spent some time reviewing the munin graphs
    1. load rarely ever touches 3. Most of the time it hovers between 0.2 - 1. So I agree that 4 cores is fine for us now.
      1. most of these auctions have a Intel Core i7-4770, which is a 4-core + 8 thread proc. That should be fine.
    2. somehow our varnish hits are way down. They used to average >80%, but currently they're down to 28-44%
  8. I documented these charts and my findings on a new Hetzner3 page
  9. I looked through the listings in the server auctions
    1. I don't want one that's only 32G RAM (few of these are)
    2. It looks like some have "2 x SSD SATA 250 GB" and some have "2 x SSD M.2 NVMe 512 GB". If we can, let's get the NVMe disks with better io
    3. there is one with "2 x HDD SATA 2,0 TB Enterprise". More space would be nice, but not at the sacrifice of io
  10. questions I have for hetzner:
    1. how many disk slots are there? Can we add more disks in the future?
    2. by default, do all these systems have RAID-1? Do we have other RAID options?
    3. oh, actually, there was only one server available for less than 38 EUR/mo that had the 2x 512GB NVME
    4. I went ahead and ordered it
  11. I also sent a separate message to hetzner sales asking them for detailed info about the different read & write speeds of their HDD, SSD, and NVME offerings in dedicated servers
  12. I sent an email to Marcin
Hey Marcin,

I just ordered a dedicated server from Hetzner with the following specs:

* Intel Core i7-6700
* 2x SSD M.2 NVMe 512 GB
* 4x RAM 16384 MB DDR4
* NIC 1 Gbit Intel I219-LM
* Location: Germany, FSN1
* Rescue system (English)
* 1 x Primary IPv4

While they had plenty of servers available with the i7-6700 and 16G of RAM, they only had one with 2x 512 GB NVMe disks (the others were just "SSD" disks). Those NVMe disks should give us a performance boost, so I snagged it while it was available.

I did some reviews of our munin charts to determine our hetzner3 server's needs. For more info, see

 * https://wiki.opensourceecology.org/wiki/Hetzner3

Please let me know if you have any questions about this server.


Thank you,

Michael Altfield
Senior Technology Advisor
PGP Fingerprint: 8A4B 0AF8 162F 3B6A 79B7  70D2 AA3E DF71 60E2 D97B

Open Source Ecology
www.opensourceecology.org

Fri July 26, 2024

  1. I started the CHG-2024-07-26_yum_update today at 11:00
  2. pre-state proof shows we have lots of outdated system packages, as expected
[root@opensourceecology ~]# yum list updates
...
xz-libs.x86_64                        5.2.2-2.el7_9                       updates
yum.noarch                            3.4.3-168.el7.centos                base   
yum-cron.noarch                       3.4.3-168.el7.centos                base   
yum-plugin-fastestmirror.noarch       1.1.31-54.el7_8                     base   
yum-utils.noarch                      1.1.31-54.el7_8                     base   
zlib.x86_64                           1.2.7-21.el7_9                      updates
[root@opensourceecology ~]# 
  1. I tried to check the backups log, but it was empty :/
[root@opensourceecology ~]# cat /var/log/backups/backup.log
[root@opensourceecology ~]# 
  1. ok, looks like it rotated already; this file shows a 20.424G backup file successfully uploaded to backblaze with rclone
[root@opensourceecology ~]# ls /var/log/backups/
backup.lo               backup.log-20240628.gz  backup.log-20240714.gz
backup.log              backup.log-20240629.gz  backup.log-20240715.gz
backup.log-20240615.gz  backup.log-20240701.gz  backup.log-20240716.gz
backup.log-20240616.gz  backup.log-20240702.gz  backup.log-20240718.gz
backup.log-20240617.gz  backup.log-20240704.gz  backup.log-20240719.gz
backup.log-20240619.gz  backup.log-20240706.gz  backup.log-20240721.gz
backup.log-20240621.gz  backup.log-20240707.gz  backup.log-20240722.gz
backup.log-20240622.gz  backup.log-20240708.gz  backup.log-20240724.gz
backup.log-20240623.gz  backup.log-20240709.gz  backup.log-20240725.gz
backup.log-20240625.gz  backup.log-20240711.gz  backup.log-20240726
backup.log-20240626.gz  backup.log-20240712.gz
backup.log-20240627.gz  backup.log-20240713.gz
[root@opensourceecology ~]# 

[root@opensourceecology ~]# tail -n20 /var/log/backups/backup.log-20240726 
 *        daily_hetzner2_20240726_072001.tar.gpg:100% /20.424G, 2.935M/s, -

2024/07/26 09:50:31 INFO  : daily_hetzner2_20240726_072001.tar.gpg: Copied (new)
2024/07/26 09:50:31 INFO  : 
Transferred:       20.424G / 20.424 GBytes, 100%, 2.979 MBytes/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:    1h57m0.8s


real    117m1.219s
user    4m20.240s
sys     2m9.432s
+ echo ================================================================================
================================================================================
++ date -u +%Y%m%d_%H%M%S
+ echo 'INFO: Finished Backup Run at 20240726_095031'
INFO: Finished Backup Run at 20240726_095031
+ echo ================================================================================
================================================================================
+ exit 0
[root@opensourceecology ~]# 
  1. the query of b2 backup files also looks good
[root@opensourceecology ~]# sudo -u b2user /home/b2user/virtualenv/bin/b2 ls ose-server-backups | grep `date "+%Y%m%d"`
daily_hetzner2_20240726_072001.tar.gpg
[root@opensourceecology ~]# date -u
Fri Jul 26 16:03:55 UTC 2024
[root@opensourceecology ~]# sudo -u b2user /home/b2user/virtualenv/bin/b2 ls ose-server-backups
daily_hetzner2_20240724_072001.tar.gpg
daily_hetzner2_20240725_072001.tar.gpg
daily_hetzner2_20240726_072001.tar.gpg
monthly_hetzner2_20230801_072001.tar.gpg
monthly_hetzner2_20230901_072001.tar.gpg
monthly_hetzner2_20231001_072001.tar.gpg
monthly_hetzner2_20231101_072001.tar.gpg
monthly_hetzner2_20231201_072001.tar.gpg
monthly_hetzner2_20240201_072001.tar.gpg
monthly_hetzner2_20240301_072001.tar.gpg
monthly_hetzner2_20240401_072001.tar.gpg
monthly_hetzner2_20240501_072001.tar.gpg
monthly_hetzner2_20240601_072001.tar.gpg
monthly_hetzner2_20240701_072001.tar.gpg
weekly_hetzner2_20240708_072001.tar.gpg
weekly_hetzner2_20240715_072001.tar.gpg
weekly_hetzner2_20240722_072001.tar.gpg
yearly_hetzner2_20190101_111520.tar.gpg
yearly_hetzner2_20200101_072001.tar.gpg
yearly_hetzner2_20210101_072001.tar.gpg
yearly_hetzner2_20230101_072001.tar.gpg
yearly_hetzner2_20240101_072001.tar.gpg
[root@opensourceecology ~]# 
  1. that backup is already 8 hours old; so let's bring down the webserver + stop the databases and take a real fresh backup before we do anything
  2. stopped nginx
[root@opensourceecology ~]# # create dir for logging the change
[root@opensourceecology ~]# tmpDir="/var/tmp/CHG-2024-07-26_yum_update"
[root@opensourceecology ~]# mkdir -p $tmpDir
[root@opensourceecology ~]# 
[root@opensourceecology ~]# # begin to gracefully shutdown nginx in the background
[root@opensourceecology ~]# time nice /sbin/nginx -s quit
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11

real    0m0.078s
user    0m0.038s
sys     0m0.021s
[root@opensourceecology ~]# 

[root@opensourceecology ~]# date -u
Fri Jul 26 16:06:37 UTC 2024
[root@opensourceecology ~]# 
  1. stopped DBs
[root@opensourceecology ~]# systemctl status mariadb
● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2024-07-22 18:55:28 UTC; 3 days ago
  Process: 1230 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)
  Process: 1099 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)
 Main PID: 1229 (mysqld_safe)
   CGroup: /system.slice/mariadb.service
		   ├─1229 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
		   └─1704 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql ...

Jul 22 18:55:25 opensourceecology.org systemd[1]: Starting MariaDB database s....
Jul 22 18:55:26 opensourceecology.org mariadb-prepare-db-dir[1099]: Database M...
Jul 22 18:55:26 opensourceecology.org mariadb-prepare-db-dir[1099]: If this is...
Jul 22 18:55:26 opensourceecology.org mysqld_safe[1229]: 240722 18:55:26 mysql...
Jul 22 18:55:26 opensourceecology.org mysqld_safe[1229]: 240722 18:55:26 mysql...
Jul 22 18:55:28 opensourceecology.org systemd[1]: Started MariaDB database se....
Hint: Some lines were ellipsized, use -l to show in full.
[root@opensourceecology ~]# systemctl stop mariadb
[root@opensourceecology ~]# systemctl status mariadb
● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Fri 2024-07-26 16:07:43 UTC; 3s ago
  Process: 1230 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)
  Process: 1229 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)
  Process: 1099 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)
 Main PID: 1229 (code=exited, status=0/SUCCESS)

Jul 22 18:55:25 opensourceecology.org systemd[1]: Starting MariaDB database s....
Jul 22 18:55:26 opensourceecology.org mariadb-prepare-db-dir[1099]: Database M...
Jul 22 18:55:26 opensourceecology.org mariadb-prepare-db-dir[1099]: If this is...
Jul 22 18:55:26 opensourceecology.org mysqld_safe[1229]: 240722 18:55:26 mysql...
Jul 22 18:55:26 opensourceecology.org mysqld_safe[1229]: 240722 18:55:26 mysql...
Jul 22 18:55:28 opensourceecology.org systemd[1]: Started MariaDB database se....
Jul 26 16:07:40 opensourceecology.org systemd[1]: Stopping MariaDB database s....
Jul 26 16:07:43 opensourceecology.org systemd[1]: Stopped MariaDB database se....
Hint: Some lines were ellipsized, use -l to show in full.
[root@opensourceecology ~]# 
  1. the backup is taking a long time. while I wait, I checked `top`, and I see `gzip` is using 80%-100%
    1. so it seems that gzip is bound by a single core. it could go much faster if it could be split across multiple cores (parallel processing)
    2. quick googling while I wait suggests that we could use `pigz` as a replacement to `gzip` to get this (admittedly low priority) performance boost https://stackoverflow.com/questions/12313242/utilizing-multi-core-for-targzip-bzip-compression-decompression
    3. there's other options too. apparently xz has native multi-treadded support since v5.2.0 https://askubuntu.com/a/858828
    4. there's also pbzip2 for bzip, and many others https://askubuntu.com/a/258228
  2. the other two commands that get stuck on one-core are `tar` and `gpg2`
    1. it looks like gpg also attempts to compress with xz. That gives us no benefits in our case because we're encrypting a tarball that just contains a bunch of already-compressed tarballs. So we could probably get some performance improvements by telling gpg to skip the xz compression with `--compression-algo none` https://stackoverflow.com/questions/46261024/how-to-do-large-file-parallel-encryption-using-gnupg-and-gnu-parallel
  3. finally (after ~30 min to generate the encrypted backup file), rclone is using >100% of CPU to upload it, so that's good. Our script does limit upload to 3 MB/s. I guess one improvement would be some argument to bypass that throttle
  4. it said the upload was going to take just under 2 hours, so I canceled it and manually ran the upload command (minus the throttle)
    1. upload speeds are now ~27-32 MB/s (so ~10x faster). It says it'll finish in just over 10 minutes.
  5. upload is done
[root@opensourceecology ~]# time sudo /bin/nice /root/backups/backup.sh &>> /var/log/backups/backup.log

^C
real    33m47.250s
user    23m56.551s
sys     2m2.866s
[root@opensourceecology ~]#

[root@opensourceecology ~]# /bin/sudo -u b2user /bin/rclone -v copy /home/b2user/sync/daily_hetzner2_20240726_160837.tar.gpg b2:ose-server-backups
...
2024/07/26 16:56:38 INFO  : 
Transferred:       18.440G / 19.206 GBytes, 96%, 22.492 MBytes/s, ETA 34s
Transferred:            0 / 1, 0%
Elapsed time:      14m0.5s
Transferring:
 *        daily_hetzner2_20240726_160837.tar.gpg: 96% /19.206G, 21.268M/s, 36s

2024/07/26 16:57:36 INFO  : daily_hetzner2_20240726_160837.tar.gpg: Copied (new)
2024/07/26 16:57:36 INFO  : 
Transferred:       19.206G / 19.206 GBytes, 100%, 21.910 MBytes/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:     14m58.6s

[root@opensourceecology ~]# 
  1. ok, this very durable backup is uploaded; let's proceed
[root@opensourceecology ~]# sudo -u b2user /home/b2user/virtualenv/bin/b2 ls ose-server-backups | grep `date "+%Y%m%d"`
daily_hetzner2_20240726_072001.tar.gpg
daily_hetzner2_20240726_160837.tar.gpg
[root@opensourceecology ~]# date -u
Fri Jul 26 16:58:11 UTC 2024
[root@opensourceecology ~]# sudo -u b2user /home/b2user/virtualenv/bin/b2 ls ose-server-backups
daily_hetzner2_20240724_072001.tar.gpg
daily_hetzner2_20240725_072001.tar.gpg
daily_hetzner2_20240726_072001.tar.gpg
daily_hetzner2_20240726_160837.tar.gpg
monthly_hetzner2_20230801_072001.tar.gpg
monthly_hetzner2_20230901_072001.tar.gpg
monthly_hetzner2_20231001_072001.tar.gpg
monthly_hetzner2_20231101_072001.tar.gpg
monthly_hetzner2_20231201_072001.tar.gpg
monthly_hetzner2_20240201_072001.tar.gpg
monthly_hetzner2_20240301_072001.tar.gpg
monthly_hetzner2_20240401_072001.tar.gpg
monthly_hetzner2_20240501_072001.tar.gpg
monthly_hetzner2_20240601_072001.tar.gpg
monthly_hetzner2_20240701_072001.tar.gpg
weekly_hetzner2_20240708_072001.tar.gpg
weekly_hetzner2_20240715_072001.tar.gpg
weekly_hetzner2_20240722_072001.tar.gpg
yearly_hetzner2_20190101_111520.tar.gpg
yearly_hetzner2_20200101_072001.tar.gpg
yearly_hetzner2_20210101_072001.tar.gpg
yearly_hetzner2_20230101_072001.tar.gpg
yearly_hetzner2_20240101_072001.tar.gpg
[root@opensourceecology ~]# 
  1. we have a snapshot of the current state of packages
[root@opensourceecology ~]# time nice rpm -qa &> "${tmpDir}/before.log"

real    0m0.716s
user    0m0.678s
sys     0m0.037s
[root@opensourceecology ~]#

[root@opensourceecology ~]# echo $tmpDir
/var/tmp/CHG-2024-07-26_yum_update
[root@opensourceecology ~]#

[root@opensourceecology ~]# tail /var/tmp/CHG-2024-07-26_yum_update/before.log 
libdb-utils-5.3.21-25.el7.x86_64
libuser-0.60-9.el7.x86_64
python-lxml-3.2.1-4.el7.x86_64
net-snmp-agent-libs-5.7.2-48.el7_8.x86_64
epel-release-7-14.noarch
perl-parent-0.225-244.el7.noarch
libstdc++-devel-4.8.5-39.el7.x86_64
libsodium13-1.0.5-1.el7.x86_64
ncurses-5.9-14.20130511.el7_4.x86_64
e2fsprogs-libs-1.42.9-17.el7.x86_64
[root@opensourceecology ~]# 
  1. I kicked-off the updates. I got a bit of a freight at first when we got "404 Not Found" errors from 484 mirrors, but eventually `yum` found a server. I'm glad we did the updates now, before all the mirrors shutdown (centos was EOL some years ago, and will no longer be getting maintenance updates as of a few weeks ago)
[root@opensourceecology ~]# grep "Error 404" /var/tmp/CHG-2024-07-26_yum_update/update.log  | wc -l
484
[root@opensourceecology ~]# 

[root@opensourceecology ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
[root@opensourceecology ~]# 
  1. actually, it says it's updating 434 packages total. So I guess some dependencies got added to the 200-odd count before
  2. ok, the update command finished in just under 4 minutes of wall time
...
real    3m56.410s
user    2m1.833s
sys     0m44.510s
[root@opensourceecology ~]#
  1. post update info
[root@opensourceecology ~]# # log the post-state packages and versions
[root@opensourceecology ~]# time nice rpm -qa &> "${tmpDir}/after.log"

real    0m0.805s
user    0m0.769s
sys     0m0.036s
[root@opensourceecology ~]#
[root@opensourceecology ~]# time nice needs-restarting &> "${tmpDir}/needs-restarting.log"


real    0m8.156s
user    0m6.956s
sys     0m0.652s
[root@opensourceecology ~]# time nice needs-restarting -r &> "${tmpDir}/needs-reboot.log"

real    0m0.155s
user    0m0.104s
sys     0m0.051s
[root@opensourceecology ~]# 

[root@opensourceecology ~]# cat /var/tmp/CHG-2024-07-26_yum_update/needs-reboot.log 
Core libraries or services have been updated:
  systemd -> 219-78.el7_9.9
  dbus -> 1:1.10.24-15.el7
  openssl-libs -> 1:1.0.2k-26.el7_9
  linux-firmware -> 20200421-83.git78c0348.el7_9
  kernel -> 3.10.0-1160.119.1.el7
  glibc -> 2.17-326.el7_9.3

Reboot is required to ensure that your system benefits from these updates.

More information:
https://access.redhat.com/solutions/27943
[root@opensourceecology ~]# 

[root@opensourceecology ~]# cat /var/tmp/CHG-2024-07-26_yum_update/needs-restarting.log 
30842 : /usr/lib/systemd/systemd-udevd 
13696 : sshd: maltfield@pts/0
27401 : /bin/bash 
744 : /sbin/auditd 
19086 : /bin/bash 
13692 : sshd: maltfield [priv]
30672 : smtpd -n smtp -t inet -u 
13699 : -bash 
18035 : su - 
27436 : less /root/backups/backup.sh 
18036 : -bash 
18030 : sudo su - 
1484 : /var/ossec/bin/ossec-analysisd 
24493 : /bin/bash 
21581 : su - 
21580 : sudo su - 
21582 : -bash 
797 : /usr/lib/systemd/systemd-logind 
24476 : /bin/bash 
1830 : qmgr -l -t unix -u 
30673 : proxymap -t unix -u 
19119 : sudo su - 
24511 : /bin/bash 
29833 : local -t unix 
27417 : sudo su - 
19130 : -bash 
1 : /usr/lib/systemd/systemd --system --deserialize 23 
29830 : cleanup -z -t unix -u 
1500 : /var/ossec/bin/ossec-logcollector 
24475 : SCREEN -S upgrade 
2150 : /usr/sbin/varnishd -P /var/run/varnish.pid -f /etc/varnish/default.vcl -a 127.0.0.1:6081 -T 127.0.0.1:6082 -S /etc/varnish/secret -u varnish -g varnish -s malloc,40G 
2152 : /usr/sbin/varnishd -P /var/run/varnish.pid -f /etc/varnish/default.vcl -a 127.0.0.1:6081 -T 127.0.0.1:6082 -S /etc/varnish/secret -u varnish -g varnish -s malloc,40G 
29835 : bounce -z -t unix -u 
775 : /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation 
27419 : -bash 
585 : /usr/lib/systemd/systemd-journald 
771 : /usr/sbin/irqbalance --foreground 
770 : /usr/sbin/acpid 
1170 : /sbin/agetty --noclear tty1 linux 
30690 : smtp -t unix -u 
778 : /usr/sbin/chronyd 
8695 : gpg-agent --daemon --use-standard-socket 
24529 : /bin/bash 
2121 : /var/ossec/bin/ossec-syscheckd 
1806 : /usr/libexec/postfix/master -w 
19129 : su - 
19065 : /bin/bash 
2124 : /var/ossec/bin/ossec-monitord 
29832 : trivial-rewrite -n rewrite -t unix -u 
19044 : /bin/bash 
30693 : smtp -t unix -u 
30692 : smtp -t unix -u 
30691 : cleanup -z -t unix -u 
27418 : su - 
1475 : /var/ossec/bin/ossec-execd 
19025 : /bin/bash 
19024 : SCREEN -S CHG-2024-07-26_yum_update 
1458 : /var/ossec/bin/ossec-maild 
19023 : screen -S CHG-2024-07-26_yum_update 
[root@opensourceecology ~]# 
  1. alright, time to reboot
[root@opensourceecology ~]# reboot
Connection to opensourceecology.org closed by remote host.
Connection to opensourceecology.org closed.
user@personal:~$ 
  1. system came back in about 1 minute
  2. first attempt to load the wiki resulted in a 503 "Error 503 Backend fetch failed" from varnish
  3. it's not just warming-up; apache didn't come-up on start
[root@opensourceecology ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2024-07-26 17:09:47 UTC; 2min 7s ago
	 Docs: man:httpd(8)
		   man:apachectl(8)
  Process: 1094 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE)
 Main PID: 1094 (code=exited, status=1/FAILURE)

Jul 26 17:09:47 opensourceecology.org systemd[1]: Starting The Apache HTTP Se....
Jul 26 17:09:47 opensourceecology.org httpd[1094]: (98)Address already in use:...
Jul 26 17:09:47 opensourceecology.org httpd[1094]: (98)Address already in use:...
Jul 26 17:09:47 opensourceecology.org httpd[1094]: no listening sockets availa...
Jul 26 17:09:47 opensourceecology.org httpd[1094]: AH00015: Unable to open logs
Jul 26 17:09:47 opensourceecology.org systemd[1]: httpd.service: main process...E
Jul 26 17:09:47 opensourceecology.org systemd[1]: Failed to start The Apache ....
Jul 26 17:09:47 opensourceecology.org systemd[1]: Unit httpd.service entered ....
Jul 26 17:09:47 opensourceecology.org systemd[1]: httpd.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
[root@opensourceecology ~]# 
  1. it says that the port is already in-use
[root@opensourceecology ~]# journalctl -u httpd --no-pager
-- Logs begin at Fri 2024-07-26 17:09:34 UTC, end at Fri 2024-07-26 17:15:26 UTC. --
Jul 26 17:09:47 opensourceecology.org systemd[1]: Starting The Apache HTTP Server...
Jul 26 17:09:47 opensourceecology.org httpd[1094]: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:443
Jul 26 17:09:47 opensourceecology.org httpd[1094]: (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:443
Jul 26 17:09:47 opensourceecology.org httpd[1094]: no listening sockets available, shutting down
Jul 26 17:09:47 opensourceecology.org httpd[1094]: AH00015: Unable to open logs
Jul 26 17:09:47 opensourceecology.org systemd[1]: httpd.service: main process exited, code=exited, status=1/FAILURE
Jul 26 17:09:47 opensourceecology.org systemd[1]: Failed to start The Apache HTTP Server.
Jul 26 17:09:47 opensourceecology.org systemd[1]: Unit httpd.service entered failed state.
Jul 26 17:09:47 opensourceecology.org systemd[1]: httpd.service failed.
[root@opensourceecology ~]# 
  1. before I start making changes, I'm going to initiate another backup (and wait at-least 30 minutes for the tar to finish)

  1. I'm going to want to diff the apache configs, so I made a copy of the backup that I made just before the updates and copied it into the temp CHG dir
[root@opensourceecology CHG-2024-07-26_yum_update]# mkdir backup_before
[root@opensourceecology CHG-2024-07-26_yum_update]# rsync -av --progress /home/b2user/sync.old/daily_hetzner2_20240726_160837.tar.gpg backup_before/
sending incremental file list
daily_hetzner2_20240726_160837.tar.gpg
 20,622,312,871 100%  127.14MB/s    0:02:34 (xfr#1, to-chk=0/1)

sent 20,627,347,744 bytes  received 35 bytes  133,510,341.61 bytes/sec
total size is 20,622,312,871  speedup is 1.00
[root@opensourceecology CHG-2024-07-26_yum_update]# 
  1. well, unfortunately the wiki being down means I can't reference our docs on how to restore backups, but I managed to figure it out
[root@opensourceecology backup_before]# gpg --batch --passphrase-file /root/backups/ose-backups-cron.key --decrypt daily_hetzner2_20240726_160837.tar.gpg > daily_hetzner2_20240726_160837.tar
gpg: AES256 encrypted data
gpg: encrypted with 1 passphrase
[root@opensourceecology backup_before]# 

[root@opensourceecology backup_before]# du -sh *
20G     daily_hetzner2_20240726_160837.tar
20G     daily_hetzner2_20240726_160837.tar.gpg
[root@opensourceecology backup_before]# 

[root@opensourceecology backup_before]# du -sh *
20G     daily_hetzner2_20240726_160837.tar
20G     daily_hetzner2_20240726_160837.tar.gpg
[root@opensourceecology backup_before]# 

[root@opensourceecology backup_before]# tar -xf daily_hetzner2_20240726_160837.tar
[root@opensourceecology backup_before]# 

[root@opensourceecology backup_before]# du -sh *
20G     daily_hetzner2_20240726_160837.tar
20G     daily_hetzner2_20240726_160837.tar.gpg
20G     root
[root@opensourceecology backup_before]# rm -f daily_hetzner2_20240726_160837.tar.gpg 
[root@opensourceecology backup_before]#

[root@opensourceecology backup_before]# du -sh *
20G     daily_hetzner2_20240726_160837.tar
20G     root
[root@opensourceecology backup_before]# 
  1. to make this easier for the next person, I created a README directly in the backups dir
[root@opensourceecology backups]# cat /root/backups/README.txt 
2024-07-26
==========

The process to restore from backups is documented on the wiki

 * https://wiki.opensourceecology.org/wiki/Backblaze#Restore_from_backups

Oh, the wiki is down and you need to restore from backups to restore the wiki? Don't worry, I got you.

All backups are stored on Backblaze B2. You can download them with rclone or just by logging into the Backblaze B2 WUI.

First decrypt the main wrapper tar with `gpg`

  gpg --batch --passphrase-file <path-to-symmetric-encrypton-private-key> --decrypt <path-to-encrypted-tarball> > <path-to-decrypted-tarball>

For example:

  gpg --batch --passphrase-file /root/backups/ose-backups-cron.key --decrypt daily_hetzner2_20240726_160837.tar.gpg > daily_hetzner2_20240726_160837.tar

Then you can untar the wrapper tarball and the compressed tarball inside of that. For example:

  tar -xf daily_hetzner2_20240726_160837.tar
  cd root/backups/sync/daily_hetzner2_20240726_160837/www/
  tar -xf www.20240726_160837.tar.gz
  head var/www/html/www.opensourceecology.org/htdocs/index.php

--Michael Altfield <https://michaelaltfield.net.>
[root@opensourceecology backups]# 
  1. and I was able to extract the www files from the backups prior to the update
[root@opensourceecology backup_before]# cd root/backups/sync/daily_hetzner2_20240726_160837/www/
[root@opensourceecology www]#

[root@opensourceecology www]# ls
www.20240726_160837.tar.gz
[root@opensourceecology www]#

[root@opensourceecology www]# tar -xf www.20240726_160837.tar.gz
[root@opensourceecology www]#

[root@opensourceecology www]# du -sh *
32G     var
19G     www.20240726_160837.tar.gz
[root@opensourceecology www]#
  1. oh, actually I want the /ettc/ config file
[root@opensourceecology www]# cd ../etc
[root@opensourceecology etc]# 

[root@opensourceecology etc]# tar -xf etc.20240726_160837.tar.gz 
[root@opensourceecology etc]# 

[root@opensourceecology etc]# du -sh *
46M     etc
13M     etc.20240726_160837.tar.gz
[root@opensourceecology etc]# 
  1. a diff of the pre-update configs and the current configs shows 4x new files
[root@opensourceecology etc]# diff -ril etc/httpd /etc/httpd
diff: etc/httpd/logs: No such file or directory
diff: etc/httpd/modules: No such file or directory
Only in /etc/httpd/conf.d: autoindex.conf
Only in /etc/httpd/conf.d: ssl.conf
Only in /etc/httpd/conf.d: userdir.conf
Only in /etc/httpd/conf.d: welcome.conf
[root@opensourceecology etc]#
  1. I just moved these 4x files out (into our tmp change dir), and tried a restart; it came up
[root@opensourceecology CHG-2024-07-26_yum_update]# mkdir moved_from_etc_httpd
[root@opensourceecology CHG-2024-07-26_yum_update]# mv /etc/httpd/conf.d/autoindex.conf moved_from_etc_httpd/
[root@opensourceecology CHG-2024-07-26_yum_update]# mv /etc/httpd/conf.d/ssl.conf moved_from_etc_httpd/
[root@opensourceecology CHG-2024-07-26_yum_update]# mv /etc/httpd/conf.d/userdir.conf moved_from_etc_httpd/
[root@opensourceecology CHG-2024-07-26_yum_update]# mv /etc/httpd/conf.d/welcome.conf moved_from_etc_httpd/
[root@opensourceecology CHG-2024-07-26_yum_update]# 

[root@opensourceecology CHG-2024-07-26_yum_update]# systemctl restart httpd
[root@opensourceecology CHG-2024-07-26_yum_update]#

[root@opensourceecology CHG-2024-07-26_yum_update]# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2024-07-26 17:59:36 UTC; 4s ago
	 Docs: man:httpd(8)
		   man:apachectl(8)
 Main PID: 15908 (httpd)
   Status: "Processing requests..."
   CGroup: /system.slice/httpd.service
		   ├─15908 /usr/sbin/httpd -DFOREGROUND
		   ├─15910 /usr/sbin/httpd -DFOREGROUND
		   ├─15911 /usr/sbin/httpd -DFOREGROUND
		   ├─15912 /usr/sbin/httpd -DFOREGROUND
		   ├─15913 /usr/sbin/httpd -DFOREGROUND
		   ├─15914 /usr/sbin/httpd -DFOREGROUND
		   ├─15921 /usr/sbin/httpd -DFOREGROUND
		   ├─15927 /usr/sbin/httpd -DFOREGROUND
		   ├─15928 /usr/sbin/httpd -DFOREGROUND
		   ├─15936 /usr/sbin/httpd -DFOREGROUND
		   ├─15937 /usr/sbin/httpd -DFOREGROUND
		   ├─15938 /usr/sbin/httpd -DFOREGROUND
		   └─15939 /usr/sbin/httpd -DFOREGROUND

Jul 26 17:59:36 opensourceecology.org systemd[1]: Starting The Apache HTTP Se....
Jul 26 17:59:36 opensourceecology.org systemd[1]: Started The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.
[root@opensourceecology CHG-2024-07-26_yum_update]# 
  1. I was able to load and edit the wiki; I spent some time adding some updates to the CHG article https://wiki.opensourceecology.org/wiki/CHG-2020-05-04_yum_update
  2. for some reason my browser keeps locking-up when all I'm trying to do is edit the text in the textarea for ^ this wiki article. I don't use the wysiwyg editor. I'm literally just editing text in a textarea; that shouldn't require any processing
  3. It took me ~20 minutes just to make a few changes to one wiki article because the page on firefox kept locking-up, sometimes displaying a spinning circle over the page
  4. I launched a new DispVM with *only* firefox running and *only* one tab open in firefox. The issue persisted, and the VM with the (idle) firefox on the edit page was taxed with 20-60% CPU usage; something is definitely wrong, but it's unclear if the bug is on our mediawiki server, my firefox client, or both
  5. anyway, I'm continuing with the validation steps
  6. I was successfully able to load the frontpage of all the 9x websites
    1. the logo at the top (and bottom) of https://oswh.opensourceecology.org/ was missing, but I'm not sure if that was the case before the updates or not
      1. I simply get a 404 on the image http://www.opensourcewarehouse.org/wp-content/uploads/2013/02/headfooter-logonew.png
      2. I guess the domain is wrong; we don't appear to use opensourcewarehouse.org anymore, so I guess this was an issue that predates our updates now
    2. everything else looked good
  7. I logged into the munin. It loads fine
    1. I do see some gaps in mysql charts where everything drops to 0 for a few hours, which I guess is before/why Marcin did reboots again. My job now isn't to investigate this now, but I'm just making a note here
    2. otherwise munin is working; validated.
  8. I logged into awstates. it loads fine
    1. I just quickly scanned the main pages for www.opensourceecology.org and wiki.opensourceecology.org; they look fine
  9. I already tested edits on wiki.opensourceecolgy.org; they're working (setting aside the client-side lag)
  10. I was successfully able to make a trivial change to the main wordpress site, and then revert that change https://www.opensourceecology.org/offline-wiki-zim-kiwix/
  11. the only thing left is the backups, which have been running the background since shortly after the reboot
    1. the backups finished being created successfully
    2. the backups are currently being uploaded at the rate-limited 3 MB/s. they're at 39% now, and estimated to finish uploading in 1h10m from now.
    3. the upload is the last step; that's good enough for me to consider the backups functional
  12. that completes our validation; I think it's safe to mark this change as successful
  13. I sent an update email to Marcin & Catarina
Hey Marcin & Catarina,

I've finished updating the system packages on the hetzner2 server.

It's a very good thing that we did this, because your server tried and failed to download its updates from 484 mirrors before it finally found a server that it could download its updates from.

As I mentioned in Nov 2022, your server runs CentOS 7, which stopped receiving "Full Updates" by Red Hat in Aug 2020. As of Jun 2024, it is no longer going to be updated in any way (security, maintenance, etc). At some point in the future, I guess all of their update servers will go down too. We're lucky at least one was still online.

 * https://wiki.opensourceecology.org/wiki/Maltfield_Log/2022#Wed_November_02.2C_2022

Today I was successfully able to update 434 system packages onto hetzner2. I did some quick validation of a subset of your websites, and I only found a couple minor errors

 1. The header & footer images of oswh don't load https://oswh.opensourceecology.org/
 2. Editing the wiki sometimes causes my browser to lock-up; it's not clear if this is a pre-existing issue, or if the issue is caused by your server or my client

I did not update your server's applications that cannot be updated by the package manager (eg wordpress, mediawiki, etc). If you don't detect any issues with your server, then I would recommend that we do the application upgrade simultaneously with a migration to a new server running Debian.

I'd like to stress again the urgency of the need to migrate off of CentOS 7. Besides the obvious security risks of running a server that is no longer receiving security patches, at some point in the likely-not-too-distant future, your server is going to break and it will be extremely non-trivial to fix it. The deadline for migrating was in 2020. I highly recommend prioritizing a project to migrate your server to a new Debian server ASAP.

Please spend some time testing your various websites, and let me know if you experience any issues.


Thank you,

Michael Altfield
Senior Technology Advisor
PGP Fingerprint: 8A4B 0AF8 162F 3B6A 79B7  70D2 AA3E DF71 60E2 D97B

Open Source Ecology
www.opensourceecology.org 
  1. I confirmed the list of updates on the server is now empty
[root@opensourceecology CHG-2024-07-26_yum_update]# yum list updates
Loaded plugins: fastestmirror, replace
Loading mirror speeds from cached hostfile
 * base: ftp.plusline.net
 * epel: mirrors.n-ix.net
 * extras: ftp.plusline.net
 * updates: mirror.checkdomain.de
[root@opensourceecology CHG-2024-07-26_yum_update]# 
  1. I'm considering the change successful
  2. looks like my tmp change dir pushed the disk to 86% capacity; let's clean that up
[root@opensourceecology CHG-2024-07-26_yum_update]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         32G     0   32G   0% /dev
tmpfs            32G     0   32G   0% /dev/shm
tmpfs            32G   17M   32G   1% /run
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/md2        197G  161G   27G  86% /
/dev/md1        488M  386M   77M  84% /boot
tmpfs           6.3G     0  6.3G   0% /run/user/1005
[root@opensourceecology CHG-2024-07-26_yum_update]# ls
after.log      before.log            needs-reboot.log      update.log
backup_before  moved_from_etc_httpd  needs-restarting.log
[root@opensourceecology CHG-2024-07-26_yum_update]# du -sh *
28K     after.log
70G     backup_before
28K     before.log
28K     moved_from_etc_httpd
4.0K    needs-reboot.log
4.0K    needs-restarting.log
216K    update.log
[root@opensourceecology CHG-2024-07-26_yum_update]# ls before.log 
before.log
[root@opensourceecology CHG-2024-07-26_yum_update]#

[root@opensourceecology CHG-2024-07-26_yum_update]# ls backup_before/
daily_hetzner2_20240726_160837.tar  root
[root@opensourceecology CHG-2024-07-26_yum_update]#

[root@opensourceecology CHG-2024-07-26_yum_update]# du -sh backup_before/*
20G     backup_before/daily_hetzner2_20240726_160837.tar
51G     backup_before/root
[root@opensourceecology CHG-2024-07-26_yum_update]#

[root@opensourceecology CHG-2024-07-26_yum_update]# ls backup_before/root
backups
[root@opensourceecology CHG-2024-07-26_yum_update]#

[root@opensourceecology CHG-2024-07-26_yum_update]# rm -rf backup_before/root
[root@opensourceecology CHG-2024-07-26_yum_update]#
  1. great, now we're down to 59%
[root@opensourceecology CHG-2024-07-26_yum_update]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         32G     0   32G   0% /dev
tmpfs            32G     0   32G   0% /dev/shm
tmpfs            32G   17M   32G   1% /run
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/md2        197G  110G   78G  59% /
/dev/md1        488M  386M   77M  84% /boot
tmpfs           6.3G     0  6.3G   0% /run/user/1005
[root@opensourceecology CHG-2024-07-26_yum_update]# 

Wed July 24, 2024

  1. Marcin contacted me a few days ago saying that the server needs reboots again
  2. I found that the last time we did a system packages update was in 2020, over 4 years ago. I strongly recommended that we update the system packages, and probably the web applications as well
    1. here's the link to the last time I updated the system packages in May 2020 https://wiki.opensourceecology.org/wiki/CHG-2020-05-04_yum_update
  3. I also noted that CentOS is now not only EOL, but it's also no longer receiving (security) updates.
    1. I warned Marcin about this approaching deadline in Nov 2022, and urged him to migrate to a new OS before 2024.
  In my prior work at OSE, I've done my best to design your systems to be robust and "well oiled" so that they would run for as long as possible with as little maintenance as possible. However, code rots over time, and there's only so long you can hold-off before things fall apart.

  Python 2.7.5 was End-of-Life'd on 2020-01-01, and it no longer receives any updates.

   * https://en.wikipedia.org/wiki/History_of_Python

  CentOS 7.7 was released 2019-09-17. "Full Updates" stopped 2020-08-06, and it will no longer receive any maintenance updates after 2024-06-30.

   * https://wiki.centos.org/About/Product

  At some point, you're going to want to migrate to a new server with a new OS. I strongly recommend initiating this project before 2024. 
    1. Here's the log entry https://wiki.opensourceecology.org/wiki/Maltfield_Log/2022#Wed_November_02.2C_2022
    2. I told Marcin to budget for ~$10,000 to migrate to a new server, as it's going to be a massive project that will likely require more than a month of full-time work to complete the migration
  1. Marcin said I should go ahead and prepare a CHG "ticket" article for the upgrade and schedule a time to do it
  2. I prepared a change ticket for updating the system packages on Friday https://wiki.opensourceecology.org/wiki/CHG-2024-07-26_yum_update
  3. I also noticed that I kept getting de-auth'd every few minutes on the wiki. That's annoying. Hopefully updates will help this (and other) issues go away.
  4. If we did a migration to debian, then we'd need to migrate to a new server
    1. previously when we migrated from hetzner1 to hetzner2, we got a 15x increase in RAM (from 4GB to 64GB). And the price of both servers was the same!
    2. I was expecting the next jump would have similar results: we'd migrate to a new server that costs the same for much better specs, but that's not looking like it's going to be the case :(
    3. Here's the currently-offered dedicated servers at hetzner https://www.hetzner.com/dedicated-rootserver/
    4. Currently we have 8-cores, 64G RAM, and two 250G disks in a RAID-1 software array. We pay 39 EUR/mo
    5. The cheapest dedicated server (EX44) currently is 46.41 EUR/month and comes with 14-cores, 64G RAM, and 2x 512G disks. That should meet our requirements https://www.hetzner.com/dedicated-rootserver/ex44/configurator/#/
      1. oh crap, we'd be downgrading the proc from the i7 (Intel® Core™ i7-6700) to an i5 (Intel® Core™ i5-13500)
      2. I'd have to check the munin charts, but I would be surprised if we ever break a load of 2, so that's still probably fine.
  5. I met with Marcin tonight to discuss [a] the system-level package upgrades, [b] the application (eg wordpress, mediawiki, etc) upgrades, and [c] the server migration
    1. I recommended that Marcin do the updates on staging, and described the risk of not doing it
      1. the problem is that the current staging environment is down, an d it may take a few days to restore it
      2. the risk is maybe a few days of downtime instead smaller change window during the update
    2. we agreed that I'll do the system-level package upgrades direct-to-production; Marcin accepted the risk of a few days of downtime
    3. Marcin also mentioned that Hetzner has a "server auction" page, which has some more servers that meet our needs at a slightly discounted price https://www.hetzner.com/sb/
      1. actually many of these are 37.72 EUR/mo, so they're actually *cheaper* than our current 39 EUR/mo. Great!
      2. there's >3 pages of servers for this 37.72 EUR/mo price. One of them has 2x 4TB drives (though it looks like spinning disks). This is a server graveyard built-to-spec for previous customers, it seems. We should be able to find one that meets our needs, so that means we'll easily double our disk and save ~15 EUR per year. Cool :)