My work log from the third quarter of the year 2024. I intentionally made this verbose to make future admin's work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.

Wed July 31, 2024

This morning I woke-up to some emails from Hetzner indicating that our Hetzner3 order is finished, and responding to my questions

>1. By default, is the two disks configured in a RAID1 array?

Servers from server auction are without pre-installed OS, so that no Software-Raid is pre-configured.

>2. Do we have any other RAID options?

The server have two disks, so that you could install an Linux OS via installimage script. There you would have the option to install it without raid or raid1 or raid0 is possible with two disks.
Please find information about installimagescript here:

https://docs.hetzner.com/robot/dedicated-server/operating-systems/installimage/

>3. How many additional (empty/unused) disk slots does this dedicated >server have? What options would we have for adding additional disks to >this machine in the future, if needed

You can add one additional NVMe or up to two additional sata SSD or sata HDD.

>4. We ordered this because it has "M.2 NVME disks" as opposed to the >"SSD" disks. Can you confirm that the NVME disks are faster than "SSD" >disks? If not, please cancel our order and we'll purchase another >machine with 3x 512G SSD disks

Usually NVMe SSD are faster than sata ssd.

>5. Currently we have another dedicated server with "2 x 250 GB SATA 6 >Gb/s SSD". Can you please tell us the "Gb/s" throughput for this >server's disks?

Unfortunately I don't have information about it. You should test it yourself on your server.

so it looks like we did well snagging the auction for the last-available, lowest-price with "2x SSD M.2 NVMe 512 GB"
1. I checked their server auction page again, and I do still see one server available at their lowest 37.72 EUR/mo price with 2x 512G NVMe disks, so I guess one listing doesn't necessarily mean that there's only one server available.
2. after migration, we should end-up with a 25% full disk. If we ~triple our current disk usage before we retire this server, we have the ability to add two more non-NVMe SATA SSD disks in another RAID1, which we can partition-up as-needed for our backups, tmp files, etc (hopefully we can keep www and DB on the faster NVMe disks)
3. their hardware page has info on the addon SSD disks that they lease, and their prices https://docs.hetzner.com/robot/dedicated-server/general-information/root-server-hardware/#drives
  1. it looks like their cheapest non-NVMe SSD is 8.50 EUR/mo for a 1T disk. Of course, we would need two for the RAID, so that means we can 4x our disk space in the future for an additional 17 EUR / mo. They also have a 3.84 TB SATA SSD for 37 EUR/mo, and they have both 16T and 22T SATA HDDs for 20.50 and 27.00 EUR/mo, respectfully.
4. we also get a free 100GB "storage box" along with our purchase, which I guess is some external NFS mount. It's probably slow as hell, but we *could* use it for something like our two local copies of encrypted backup data. We also have this as part of our hetzner2 plan, but we don't use it.
  1. the free 100G storage box is named "BX10". We can even increase this to a BX11 (1T @ 3.20 EUR/mo), BX21 (5T @ 10.90 EUR/mo), BX31 (20.80 EUR/mo), or BX41 (40.60 EUR/mo)
5. this is my first time setting-up a hetzner dedicated server (I inherited both hetzner1 & hetzner2)
6. I was hoping for something like an in-browser KVM/VNC like SolusVM offers. Or to feed it 'cloudinit' like hetnzer cloud offers, but I don't see that as an option
7. looks like they have scripts for installing a few disros. docs here https://docs.hetzner.com/robot/dedicated-server/operating-systems/installimage/
8. and here's the general docs on their dedicated servers https://docs.hetzner.com/robot/dedicated-server/getting-started/root-server-guide/
9. ok, they do offer a KVM-over-IP for installing custom distros. Apparently a technician has to physically plug-in to the machine, and you're given 3 free hours. After that it's 8.40 EUR/hr https://docs.hetzner.com/robot/dedicated-server/maintainance/kvm-console/
the new server is now listed on our "Hetzner Robot" Server Page https://robot.hetzner.com/server
1. the old server is listed as "EX41S-SSD #XXXXXX"
2. the new server is listed more simply as "Server Auction #XXXXXXX"
I sent another support request to hetzner asking which type of hardware we have

Hi,

I have another question about possible disk upgrades to our newly-purcahsed server "Server Auction #2443019".

Can you please tell us what type of server we ordered? Is it an AX? DX? EX? GEX? PX? RX? SX? Or what?

I ask because your documentation page on what disk upgrade options are available has a lot of caveats (eg "only for the following servers" or "not available for XYZ"), so to know what our options are I need to know which server type we have.

 * https://docs.hetzner.com/robot/dedicated-server/general-information/root-server-hardware/#drives

Please let us know which server type we have, so that that I can figure out what disk upgrade options are available (and their prices).


Thank you,

I probably should have done this before, but I checked the cloud hetzner offerings
1. I didn't check it before because I expected it to be memory bound. I do want to stick with 64G of RAM
2. indeed, the cheapest server with 64G of RAM in the cloud is 95.99 EUR/mo — but that also gives us 16 dedicated cores (AMD) and 360 GB disk. And, of course, it's easier to upgrade. But too expensive
3. as said above, we *could* probably get-by with 16G of RAM. That's 23.99 EUR/mo with dedicated vCPU. 32G is with dedicated vCPU is 47.99 EU/mo. With shared vCPU, we get 16G RAM for 15.90 EUR/mo or 32G for 31.90GB/mo. But we can't increase the RAM beyond 32GB/mo on the shared vCPU systems
4. therefore, I do think going with the dedicated server is our best bet due to the value on the RAM that we get.
5. I also asked hetzner sales about possible memory upgrades. I don't think we'll need more than 64G of RAM, but it would be good to know if upgrading is possible

Hi,

I have another question about possible memory upgrades to our newly-purcahsed server "Server Auction #2443019".

Can you please tell us if our current configuration with "4x RAM 16384 MB DDR4" is the maximum RAM that this system can accept?

Does the server only have 4x RAM slots? Or are there some empty ones? Is it possible to increase one or more of the RAM slots with >16G RAM chips?

Please let us know what options we have for future memory upgrades on this new dedicated server.

Thank you,

I tried to shut down the hetzner3 server (to eliminate it as a vector until I've hardened it), but there's only an option to reboot :(
I gave hetzner my ssh public key at order-time, and — yep — it's setup with the root user by default :(

user@personal:~/tmp/ansible$ ssh root@144.76.164.201
Linux rescue 6.9.7 #1 SMP Thu Jun 27 15:07:37 UTC 2024 x86_64

--------------------

  Welcome to the Hetzner Rescue System.

  This Rescue System is based on Debian GNU/Linux 12 (bookworm) with a custom kernel.
  You can install software like you would in a normal system.

  To install a new operating system from one of our prebuilt images, run 'installimage' and follow the instructions.

  Important note: Any data that was not written to the disks will be lost during a reboot.

  For additional information, check the following resources:
	Rescue System:           https://docs.hetzner.com/robot/dedicated-server/troubleshooting/hetzner-rescue-system
	Installimage:            https://docs.hetzner.com/robot/dedicated-server/operating-systems/installimage
	Install custom software: https://docs.hetzner.com/robot/dedicated-server/operating-systems/installing-custom-images
	other articles:          https://docs.hetzner.com/robot

--------------------

Rescue System (via Legacy/CSM) up since 2024-07-31 09:16 +02:00

Hardware data:

   CPU1: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (Cores 8)
   Memory:  64099 MB
   Disk /dev/nvme0n1: 512 GB (=> 476 GiB) doesn't contain a valid partition table
   Disk /dev/nvme1n1: 512 GB (=> 476 GiB) doesn't contain a valid partition table
   Total capacity 953 GiB with 2 Disks

Network data:
   eth0  LINK: yes
		 MAC:  90:1b:0e:c4:28:b4
		 IP:   144.76.164.201
		 IPv6: 2a01:4f8:200:40d7::2/64
		 Intel(R) PRO/1000 Network Driver

root@rescue ~ #

here's our disks info. So already only 476.9G. It'll be a bit less after we put a filesystem on it, I'm sure

root@rescue ~ # lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0     7:0    0   3.1G  1 loop 
nvme0n1 259:0    0 476.9G  0 disk 
nvme1n1 259:1    0 476.9G  0 disk 
root@rescue ~ #

I'm reading through the guide on hetzner's `installimage` tool https://docs.hetzner.com/robot/dedicated-server/operating-systems/installimage/
1. the guide suggests a couple commands to see if we have a hardware raid. I didn't think we did, and this appears to confirm it

root@rescue ~ # megacli -LDInfo -Lall -Aall
                                     

Exit Code: 0x00
root@rescue ~ # 

root@rescue ~ # arcconf GETCONFIG 1 LD
Controllers found: 0
Invalid controller number.
root@rescue ~ #

it appears we don't have any software RAIDs already setup either

root@rescue ~ # ls /dev/md*
ls: cannot access '/dev/md*': No such file or directory
root@rescue ~ #

quick disk tests show that we're getting 2 Gb/s disk read. shit, that's slower than the advertised "6 Gb/s" on our prod server (though admittedly I never tested this)

root@rescue ~ # hdparm -Ttv /dev/nvme0n1

/dev/nvme0n1:
 readonly      =  0 (off)
 readahead     = 256 (on)
 geometry      = 488386/64/32, sectors = 1000215216, start = 0
 Timing cached reads:   35298 MB in  1.97 seconds = 17911.22 MB/sec
 Timing buffered disk reads: 6992 MB in  3.00 seconds = 2330.11 MB/sec
root@rescue ~ #

ok, I ran installimage

root@rescue ~ # installimage

1. I selected "Debian"
2. I selected "Debian-1205-bookworm-amd64-base"
3. it dumped me into a midnight commander editor and said I could save with F10
4. it said that "by default all disks are used for software raid" — that sounds good
5. the "standard config" file that it gave me was too long to try to copy & paste, but the default RAID looked like what we wanted
6. the default partition layout had a 32G swap, 1G /boot, and the rest allocated to '/'
  1. hetzner2 has a 488M /boot that's currently 84% full (using 386M). 386/1024 = 38% full, which is much better. That sounds good.
  2. hetzner2 has a 32G swap. It does get used, but it's currently using <1G. 32G should be fine.
  3. I thought about setting up an LVM. It would be a better idea than having everything on one disk, but it would inevitable require more maintenance. For the sake of keeping things simple for a non-profit that has no sysadmins on staff, I'm going to stick to "just allocate the rest to '/'"
7. oh, cool, the config file said what our disks are (if it can be trusted). It says: SAMSUNG MZVLB512HAJQ
  1. looks like they're from 2017 https://ssd.userbenchmark.com/SpeedTest/401452/SAMSUNG-MZVLB512HAJQ-000L2
  2. samsung advertises them as having 3.5 Gbps sequental read + 2.9 Gbps sequential write + 460K random read iops + 500k random write iops
8. I decided to accept all these defaults and proceed with the install
  1. oh, except I did change the hostname line
9. ah shit, how do I send an F10 command over ssh? stupid midnight commander editor...
  1. I pressed Ctrl+[[ and that seemed to work
the install finished in a few minutes

				Hetzner Online GmbH - installimage

  Your server will be installed now, this will take some minutes
			 You can abort at any time with CTRL+C ...

		 :  Reading configuration                           done 
		 :  Loading image file variables                    done 
		 :  Loading debian specific functions               done 
   1/16  :  Deleting partitions                             done 
   2/16  :  Test partition size                             done 
   3/16  :  Creating partitions and /etc/fstab              done 
   4/16  :  Creating software RAID level 1                  done 
   5/16  :  Formatting partitions
		 :    formatting /dev/md/0 with swap                done 
		 :    formatting /dev/md/1 with ext3                done 
		 :    formatting /dev/md/2 with ext4                done 
   6/16  :  Mounting partitions                             done 
   7/16  :  Sync time via ntp                               done 
		 :  Importing public key for image validation       done 
   8/16  :  Validating image before starting extraction     done 
   9/16  :  Extracting image (local)                        done 
  10/16  :  Setting up network config                       done 
  11/16  :  Executing additional commands
		 :    Setting hostname                              done 
		 :    Generating new SSH keys                       done 
		 :    Generating mdadm config                       done 
		 :    Generating ramdisk                            done 
		 :    Generating ntp config                         done 
  12/16  :  Setting up miscellaneous files                  done 
  13/16  :  Configuring authentication
		 :    Fetching SSH keys                             done 
		 :    Disabling root password                       done 
		 :    Disabling SSH root login with password        done 
		 :    Copying SSH keys                              done 
  14/16  :  Installing bootloader grub                      done 
  15/16  :  Running some debian specific functions          done 
  16/16  :  Clearing log files                              done 

				  INSTALLATION COMPLETE
   You can now reboot and log in to your new system with the
 same credentials that you used to log into the rescue system.

root@rescue ~ #

1. I have to say that I was happy that it generated new ssh keys and that it said it was verifying some public key and doing some image verification.
2. oh, and they disabled root password and ssh login with password. that's better than I expected from them. good.
I ran `shutdown -h now`, and the server didn't come back. That's actually good. I wanted to see if I could shut the thing down (the safest machine is a machine that's off, especially before hardening), but the hetzner robot WUI didn't give an option to shutdown (only to reboot).
after waiting 5 minutes with no pongs to my pings, I logged into the hetzner robot wui -> server -> reset tab -> Execute an automatic hardware reset, and I clicked "Send" https://robot.hetzner.com/server
after a few more minutes with still no pongs to my pings, I logged into the hetzner robot wui -> server -> WOL tab -> clicked "Send WOL signal to server"
after about 2 minutes, I started getting pings and was able to ssh-in as the 'root' user
as soon as I got a shell from ssh, I quickly pasted-in my "jumpstart" provisioning and hardening commands to create a user for me, do basic ssh hardening, and setup a basic firewall to block everything except ssh

adduser maltfield --disabled-password --gecos ''
groupadd sshaccess
gpasswd -a maltfield sshaccess
mkdir /home/maltfield/.ssh/
echo "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDGNYjR7UKiJSAG/AbP+vlCBqNfQZ2yuSXfsEDuM7cEU8PQNJyuJnS7m0VcA48JRnpUpPYYCCB0fqtIEhpP+szpMg2LByfTtbU0vDBjzQD9mEfwZ0mzJsfzh1Nxe86l/d6h6FhxAqK+eG7ljYBElDhF4l2lgcMAl9TiSba0pcqqYBRsvJgQoAjlZOIeVEvM1lyfWfrmDaFK37jdUCBWq8QeJ98qpNDX4A76f9T5Y3q5EuSFkY0fcU+zwFxM71bGGlgmo5YsMMdSsW+89fSG0652/U4sjf4NTHCpuD0UaSPB876NJ7QzeDWtOgyBC4nhPpS8pgjsnl48QZuVm6FNDqbXr9bVk5BdntpBgps+gXdSL2j0/yRRayLXzps1LCdasMCBxCzK+lJYWGalw5dNaIDHBsEZiK55iwPp0W3lU9vXFO4oKNJGFgbhNmn+KAaW82NBwlTHo/tOlj2/VQD9uaK5YLhQqAJzIq0JuWZWFLUC2FJIIG0pJBIonNabANcN+vq+YJqjd+JXNZyTZ0mzuj3OAB/Z5zS6lT9azPfnEjpcOngFs46P7S/1hRIrSWCvZ8kfECpa8W+cTMus4rpCd40d1tVKzJA/n0MGJjEs2q4cK6lC08pXxq9zAyt7PMl94PHse2uzDFhrhh7d0ManxNZE+I5/IPWOnG1PJsDlOe4Yqw== maltfield@ose" > /home/maltfield/.ssh/authorized_keys
chown -R maltfield:maltfield /home/maltfield/.ssh
chmod -R 0600 /home/maltfield/.ssh
chmod 0700 /home/maltfield/.ssh

# without this, apt-get may get stuck
export DEBIAN_FRONTEND=noninteractive

apt-get update
apt-get -y install iptables iptables-persistent
apt-get -y purge nftables

update-alternatives --set iptables /usr/sbin/iptables-legacy
update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
update-alternatives --set arptables /usr/sbin/arptables-legacy
update-alternatives --set ebtables /usr/sbin/ebtables-legacy

iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -j DROP
iptables -A INPUT -p icmp -j ACCEPT
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT
iptables -A INPUT -j DROP

iptables -A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -j ACCEPT
iptables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
iptables -A OUTPUT -m owner --uid-owner 42 -j ACCEPT
iptables -A OUTPUT -m owner --uid-owner 1000 -j ACCEPT
iptables -A OUTPUT -m limit --limit 5/min -j LOG --log-prefix "iptables denied: " --log-level 7
iptables -A OUTPUT -j DROP

ip6tables -A INPUT -i lo -j ACCEPT
ip6tables -A INPUT -s ::1/128 -d ::1/128 -j DROP
ip6tables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
ip6tables -A INPUT -j DROP

ip6tables -A OUTPUT -s ::1/128 -d ::1/128 -j ACCEPT
ip6tables -A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
ip6tables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
ip6tables -A OUTPUT -m owner --uid-owner 42 -j ACCEPT
ip6tables -A OUTPUT -m owner --uid-owner 1000 -j ACCEPT
ip6tables -A OUTPUT -j DROP

iptables-save > /etc/iptables/rules.v4
ip6tables-save > /etc/iptables/rules.v6

cp /etc/ssh/sshd_config /etc/ssh/sshd_config.orig.`date "+%Y%m%d_%H%M%S"`
grep 'Port 32415' /etc/ssh/sshd_config || echo 'Port 32415' >> /etc/ssh/sshd_config
grep 'AllowGroups sshaccess' /etc/ssh/sshd_config || echo 'AllowGroups sshaccess' >> /etc/ssh/sshd_config
grep 'PermitRootLogin no' /etc/ssh/sshd_config || echo 'PermitRootLogin no' >> /etc/ssh/sshd_config
grep 'PasswordAuthentication no' /etc/ssh/sshd_config || echo 'PasswordAuthentication no' >> /etc/ssh/sshd_config
systemctl restart sshd.service

apt-get -y upgrade

I added this new entry to my ose VM's /home/user/.ssh/config

Host hetzner3
	Hostname 144.76.164.201
	Port 32415
	ForwardAgent yes
	User maltfield

I then gave my user sudo permission

root@mail ~ # cp /etc/sudoers /etc/sudoers.20240731.orig
root@mail ~ # 

root@mail ~ # visudo
root@mail ~ # 

root@mail ~ # diff /etc/sudoers.20240731.orig /etc/sudoers
47a48
> maltfield ALL=(ALL:ALL) NOPASSWD:ALL
root@mail ~ #

alright, basic hardening is done.
1. That's all I really wanted to achieve for now. Next I'd like to prepare some ansible playbooks to setup the rest of the basic hardening
for now, I just want to leave this machine off in the meantime
1. I attempted to shut it down again
2. I left a ping open for 147 minutes, and I never got a pong back. So I'd say it's off. Great!
  1. it appears that I just have to trigger a WOL on hetzner robot WUI to turn it back on, which I'll do after I spend some time working on the ansible roles playbooks

Tue July 30, 2024

Marcin gave me the go-ahead to order a "hetzner3" server and begin provisioning it with Debian in preparation to migrate all our sites from the CentOS7 hetzner2 server to this new server
This is going to be an enormous project. When I did the hetzner1 -> hetzner2 migration, I inherited both systems (in 2017). For some reason the websites were split across both servers (plus dreamhost too iirc?). but I consolidated everything onto "hetzner2" and canceled "hetzner1" in 2018 https://wiki.opensourceecology.org/index.php?title=OSE_Server&oldid=298909#Assessment_of_Server_Options
I'll be using ansible to assist in provisioning this server (and hopefully make it easier to provision future servers). Marcin expressed interest in lowering this barrier for others, as well.
1. I noticed that 5 years ago I created a repo for OSE's ansible playbooks, but it's empty.
2. I just added the LICENSE to this repo, and I plan to use it to publish our ansible roles/playbooks
First thing I need to do is decide which server to buy from hetzner's dedicated serer offerings
1. holy crap, not only are their server auctions *much* cheeper per month, they also don't have a one-time-setup fee (usually ~$50-$200?)
I've written pretty extensively in the past about what specs I'd be looking to get in a future OSE Server migration https://wiki.opensourceecology.org/index.php?title=OSE_Server&oldid=298909#OSE_Server_and_Server_Requirements
1. In 2018, I said we'd want min 2-4 cores
2. In 2018, I said we'd want min 8-16 G RAM
3. In 2018, I said we'd want min ~200G disk
Honestly, I expect that the lowest offerings of a dedicated server in 2024 are probably going to suffice for us, but what I'm mostly concerned-about is the disk.
1. even last week when I did the yum updates, I nearly filled the disk just by extracting a copy of our backups. Currently we have two 250G disks in a software RAID-1 (mirror) array. That give us a useable 197G
2. it's also provisioned with all the data on '/'. It would be smart if we setup an LVM
3. It's important to me that we double this at-least, but I'll see if there's any deals on 1TB disks or larger
4. also what we currently have is a 6 Gb/s SSD, so I don't want to downgrade that by going to a spinning-disk HDD. NvME might be a welcome upgrade. I/O wait is probably a bottleneck, but not currently one that's causing us agony
I spent some time reviewing the munin graphs
1. load rarely ever touches 3. Most of the time it hovers between 0.2 - 1. So I agree that 4 cores is fine for us now.
  1. most of these auctions have a Intel Core i7-4770, which is a 4-core + 8 thread proc. That should be fine.
2. somehow our varnish hits are way down. They used to average >80%, but currently they're down to 28-44%
I documented these charts and my findings on a new Hetzner3 page
I looked through the listings in the server auctions
1. I don't want one that's only 32G RAM (few of these are)
2. It looks like some have "2 x SSD SATA 250 GB" and some have "2 x SSD M.2 NVMe 512 GB". If we can, let's get the NVMe disks with better io
3. there is one with "2 x HDD SATA 2,0 TB Enterprise". More space would be nice, but not at the sacrifice of io
questions I have for hetzner:
1. how many disk slots are there? Can we add more disks in the future?
2. by default, do all these systems have RAID-1? Do we have other RAID options?
3. oh, actually, there was only one server available for less than 38 EUR/mo that had the 2x 512GB NVME
4. I went ahead and ordered it
I also sent a separate message to hetzner sales asking them for detailed info about the different read & write speeds of their HDD, SSD, and NVME offerings in dedicated servers
I sent an email to Marcin

Hey Marcin,

I just ordered a dedicated server from Hetzner with the following specs:

* Intel Core i7-6700
* 2x SSD M.2 NVMe 512 GB
* 4x RAM 16384 MB DDR4
* NIC 1 Gbit Intel I219-LM
* Location: Germany, FSN1
* Rescue system (English)
* 1 x Primary IPv4

While they had plenty of servers available with the i7-6700 and 16G of RAM, they only had one with 2x 512 GB NVMe disks (the others were just "SSD" disks). Those NVMe disks should give us a performance boost, so I snagged it while it was available.

I did some reviews of our munin charts to determine our hetzner3 server's needs. For more info, see

 * https://wiki.opensourceecology.org/wiki/Hetzner3

Please let me know if you have any questions about this server.


Thank you,

Michael Altfield
Senior Technology Advisor
PGP Fingerprint: 8A4B 0AF8 162F 3B6A 79B7  70D2 AA3E DF71 60E2 D97B

Open Source Ecology
www.opensourceecology.org

Fri July 26, 2024

I started the CHG-2024-07-26_yum_update today at 11:00
pre-state proof shows we have lots of outdated system packages, as expected

[root@opensourceecology ~]# yum list updates
...
xz-libs.x86_64                        5.2.2-2.el7_9                       updates
yum.noarch                            3.4.3-168.el7.centos                base   
yum-cron.noarch                       3.4.3-168.el7.centos                base   
yum-plugin-fastestmirror.noarch       1.1.31-54.el7_8                     base   
yum-utils.noarch                      1.1.31-54.el7_8                     base   
zlib.x86_64                           1.2.7-21.el7_9                      updates
[root@opensourceecology ~]#

I tried to check the backups log, but it was empty :/

[root@opensourceecology ~]# cat /var/log/backups/backup.log
[root@opensourceecology ~]#

ok, looks like it rotated already; this file shows a 20.424G backup file successfully uploaded to backblaze with rclone

[root@opensourceecology ~]# ls /var/log/backups/
backup.lo               backup.log-20240628.gz  backup.log-20240714.gz
backup.log              backup.log-20240629.gz  backup.log-20240715.gz
backup.log-20240615.gz  backup.log-20240701.gz  backup.log-20240716.gz
backup.log-20240616.gz  backup.log-20240702.gz  backup.log-20240718.gz
backup.log-20240617.gz  backup.log-20240704.gz  backup.log-20240719.gz
backup.log-20240619.gz  backup.log-20240706.gz  backup.log-20240721.gz
backup.log-20240621.gz  backup.log-20240707.gz  backup.log-20240722.gz
backup.log-20240622.gz  backup.log-20240708.gz  backup.log-20240724.gz
backup.log-20240623.gz  backup.log-20240709.gz  backup.log-20240725.gz
backup.log-20240625.gz  backup.log-20240711.gz  backup.log-20240726
backup.log-20240626.gz  backup.log-20240712.gz
backup.log-20240627.gz  backup.log-20240713.gz
[root@opensourceecology ~]# 

[root@opensourceecology ~]# tail -n20 /var/log/backups/backup.log-20240726 
 *        daily_hetzner2_20240726_072001.tar.gpg:100% /20.424G, 2.935M/s, -

2024/07/26 09:50:31 INFO  : daily_hetzner2_20240726_072001.tar.gpg: Copied (new)
2024/07/26 09:50:31 INFO  : 
Transferred:       20.424G / 20.424 GBytes, 100%, 2.979 MBytes/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:    1h57m0.8s


real    117m1.219s
user    4m20.240s
sys     2m9.432s
+ echo ================================================================================
================================================================================
++ date -u +%Y%m%d_%H%M%S
+ echo 'INFO: Finished Backup Run at 20240726_095031'
INFO: Finished Backup Run at 20240726_095031
+ echo ================================================================================
================================================================================
+ exit 0
[root@opensourceecology ~]#

the query of b2 backup files also looks good

[root@opensourceecology ~]# sudo -u b2user /home/b2user/virtualenv/bin/b2 ls ose-server-backups | grep `date "+%Y%m%d"`
daily_hetzner2_20240726_072001.tar.gpg
[root@opensourceecology ~]# date -u
Fri Jul 26 16:03:55 UTC 2024
[root@opensourceecology ~]# sudo -u b2user /home/b2user/virtualenv/bin/b2 ls ose-server-backups
daily_hetzner2_20240724_072001.tar.gpg
daily_hetzner2_20240725_072001.tar.gpg
daily_hetzner2_20240726_072001.tar.gpg
monthly_hetzner2_20230801_072001.tar.gpg
monthly_hetzner2_20230901_072001.tar.gpg
monthly_hetzner2_20231001_072001.tar.gpg
monthly_hetzner2_20231101_072001.tar.gpg
monthly_hetzner2_20231201_072001.tar.gpg
monthly_hetzner2_20240201_072001.tar.gpg
monthly_hetzner2_20240301_072001.tar.gpg
monthly_hetzner2_20240401_072001.tar.gpg
monthly_hetzner2_20240501_072001.tar.gpg
monthly_hetzner2_20240601_072001.tar.gpg
monthly_hetzner2_20240701_072001.tar.gpg
weekly_hetzner2_20240708_072001.tar.gpg
weekly_hetzner2_20240715_072001.tar.gpg
weekly_hetzner2_20240722_072001.tar.gpg
yearly_hetzner2_20190101_111520.tar.gpg
yearly_hetzner2_20200101_072001.tar.gpg
yearly_hetzner2_20210101_072001.tar.gpg
yearly_hetzner2_20230101_072001.tar.gpg
yearly_hetzner2_20240101_072001.tar.gpg
[root@opensourceecology ~]#

that backup is already 8 hours old; so let's bring down the webserver + stop the databases and take a real fresh backup before we do anything
stopped nginx

[root@opensourceecology ~]# # create dir for logging the change
[root@opensourceecology ~]# tmpDir="/var/tmp/CHG-2024-07-26_yum_update"
[root@opensourceecology ~]# mkdir -p $tmpDir
[root@opensourceecology ~]# 
[root@opensourceecology ~]# # begin to gracefully shutdown nginx in the background
[root@opensourceecology ~]# time nice /sbin/nginx -s quit
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11
nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11

real    0m0.078s
user    0m0.038s
sys     0m0.021s
[root@opensourceecology ~]# 

[root@opensourceecology ~]# date -u
Fri Jul 26 16:06:37 UTC 2024
[root@opensourceecology ~]#

stopped DBs

[root@opensourceecology ~]# systemctl status mariadb
● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2024-07-22 18:55:28 UTC; 3 days ago
  Process: 1230 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)
  Process: 1099 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)
 Main PID: 1229 (mysqld_safe)
   CGroup: /system.slice/mariadb.service
		   ├─1229 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
		   └─1704 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql ...

Jul 22 18:55:25 opensourceecology.org systemd[1]: Starting MariaDB database s....
Jul 22 18:55:26 opensourceecology.org mariadb-prepare-db-dir[1099]: Database M...
Jul 22 18:55:26 opensourceecology.org mariadb-prepare-db-dir[1099]: If this is...
Jul 22 18:55:26 opensourceecology.org mysqld_safe[1229]: 240722 18:55:26 mysql...
Jul 22 18:55:26 opensourceecology.org mysqld_safe[1229]: 240722 18:55:26 mysql...
Jul 22 18:55:28 opensourceecology.org systemd[1]: Started MariaDB database se....
Hint: Some lines were ellipsized, use -l to show in full.
[root@opensourceecology ~]# systemctl stop mariadb
[root@opensourceecology ~]# systemctl status mariadb
● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Fri 2024-07-26 16:07:43 UTC; 3s ago
  Process: 1230 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)
  Process: 1229 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)
  Process: 1099 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)
 Main PID: 1229 (code=exited, status=0/SUCCESS)

Jul 22 18:55:25 opensourceecology.org systemd[1]: Starting MariaDB database s....
Jul 22 18:55:26 opensourceecology.org mariadb-prepare-db-dir[1099]: Database M...
Jul 22 18:55:26 opensourceecology.org mariadb-prepare-db-dir[1099]: If this is...
Jul 22 18:55:26 opensourceecology.org mysqld_safe[1229]: 240722 18:55:26 mysql...
Jul 22 18:55:26 opensourceecology.org mysqld_safe[1229]: 240722 18:55:26 mysql...
Jul 22 18:55:28 opensourceecology.org systemd[1]: Started MariaDB database se....
Jul 26 16:07:40 opensourceecology.org systemd[1]: Stopping MariaDB database s....
Jul 26 16:07:43 opensourceecology.org systemd[1]: Stopped MariaDB database se....
Hint: Some lines were ellipsized, use -l to show in full.
[root@opensourceecology ~]#

the backup is taking a long time. while I wait, I checked `top`, and I see `gzip` is using 80%-100%
1. so it seems that gzip is bound by a single core. it could go much faster if it could be split across multiple cores (parallel processing)
2. quick googling while I wait suggests that we could use `pigz` as a replacement to `gzip` to get this (admittedly low priority) performance boost https://stackoverflow.com/questions/12313242/utilizing-multi-core-for-targzip-bzip-compression-decompression
3. there's other options too. apparently xz has native multi-treadded support since v5.2.0 https://askubuntu.com/a/858828
4. there's also pbzip2 for bzip, and many others https://askubuntu.com/a/258228
the other two commands that get stuck on one-core are `tar` and `gpg2`
1. it looks like gpg also attempts to compress with xz. That gives us no benefits in our case because we're encrypting a tarball that just contains a bunch of already-compressed tarballs. So we could probably get some performance improvements by telling gpg to skip the xz compression with `--compression-algo none` https://stackoverflow.com/questions/46261024/how-to-do-large-file-parallel-encryption-using-gnupg-and-gnu-parallel
finally (after ~30 min to generate the encrypted backup file), rclone is using >100% of CPU to upload it, so that's good. Our script does limit upload to 3 MB/s. I guess one improvement would be some argument to bypass that throttle
it said the upload was going to take just under 2 hours, so I canceled it and manually ran the upload command (minus the throttle)
1. upload speeds are now ~27-32 MB/s (so ~10x faster). It says it'll finish in just over 10 minutes.
upload is done

[root@opensourceecology ~]# time sudo /bin/nice /root/backups/backup.sh &>> /var/log/backups/backup.log

^C
real    33m47.250s
user    23m56.551s
sys     2m2.866s
[root@opensourceecology ~]#

[root@opensourceecology ~]# /bin/sudo -u b2user /bin/rclone -v copy /home/b2user/sync/daily_hetzner2_20240726_160837.tar.gpg b2:ose-server-backups
...
2024/07/26 16:56:38 INFO  : 
Transferred:       18.440G / 19.206 GBytes, 96%, 22.492 MBytes/s, ETA 34s
Transferred:            0 / 1, 0%
Elapsed time:      14m0.5s
Transferring:
 *        daily_hetzner2_20240726_160837.tar.gpg: 96% /19.206G, 21.268M/s, 36s

2024/07/26 16:57:36 INFO  : daily_hetzner2_20240726_160837.tar.gpg: Copied (new)
2024/07/26 16:57:36 INFO  : 
Transferred:       19.206G / 19.206 GBytes, 100%, 21.910 MBytes/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:     14m58.6s

[root@opensourceecology ~]#

ok, this very durable backup is uploaded; let's proceed

[root@opensourceecology ~]# sudo -u b2user /home/b2user/virtualenv/bin/b2 ls ose-server-backups | grep `date "+%Y%m%d"`
daily_hetzner2_20240726_072001.tar.gpg
daily_hetzner2_20240726_160837.tar.gpg
[root@opensourceecology ~]# date -u
Fri Jul 26 16:58:11 UTC 2024
[root@opensourceecology ~]# sudo -u b2user /home/b2user/virtualenv/bin/b2 ls ose-server-backups
daily_hetzner2_20240724_072001.tar.gpg
daily_hetzner2_20240725_072001.tar.gpg
daily_hetzner2_20240726_072001.tar.gpg
daily_hetzner2_20240726_160837.tar.gpg
monthly_hetzner2_20230801_072001.tar.gpg
monthly_hetzner2_20230901_072001.tar.gpg
monthly_hetzner2_20231001_072001.tar.gpg
monthly_hetzner2_20231101_072001.tar.gpg
monthly_hetzner2_20231201_072001.tar.gpg
monthly_hetzner2_20240201_072001.tar.gpg
monthly_hetzner2_20240301_072001.tar.gpg
monthly_hetzner2_20240401_072001.tar.gpg
monthly_hetzner2_20240501_072001.tar.gpg
monthly_hetzner2_20240601_072001.tar.gpg
monthly_hetzner2_20240701_072001.tar.gpg
weekly_hetzner2_20240708_072001.tar.gpg
weekly_hetzner2_20240715_072001.tar.gpg
weekly_hetzner2_20240722_072001.tar.gpg
yearly_hetzner2_20190101_111520.tar.gpg
yearly_hetzner2_20200101_072001.tar.gpg
yearly_hetzner2_20210101_072001.tar.gpg
yearly_hetzner2_20230101_072001.tar.gpg
yearly_hetzner2_20240101_072001.tar.gpg
[root@opensourceecology ~]#

we have a snapshot of the current state of packages

[root@opensourceecology ~]# time nice rpm -qa &> "${tmpDir}/before.log"

real    0m0.716s
user    0m0.678s
sys     0m0.037s
[root@opensourceecology ~]#

[root@opensourceecology ~]# echo $tmpDir
/var/tmp/CHG-2024-07-26_yum_update
[root@opensourceecology ~]#

[root@opensourceecology ~]# tail /var/tmp/CHG-2024-07-26_yum_update/before.log 
libdb-utils-5.3.21-25.el7.x86_64
libuser-0.60-9.el7.x86_64
python-lxml-3.2.1-4.el7.x86_64
net-snmp-agent-libs-5.7.2-48.el7_8.x86_64
epel-release-7-14.noarch
perl-parent-0.225-244.el7.noarch
libstdc++-devel-4.8.5-39.el7.x86_64
libsodium13-1.0.5-1.el7.x86_64
ncurses-5.9-14.20130511.el7_4.x86_64
e2fsprogs-libs-1.42.9-17.el7.x86_64
[root@opensourceecology ~]#

I kicked-off the updates. I got a bit of a freight at first when we got "404 Not Found" errors from 484 mirrors, but eventually `yum` found a server. I'm glad we did the updates now, before all the mirrors shutdown (centos was EOL some years ago, and will no longer be getting maintenance updates as of a few weeks ago)

[root@opensourceecology ~]# grep "Error 404" /var/tmp/CHG-2024-07-26_yum_update/update.log  | wc -l
484
[root@opensourceecology ~]# 

[root@opensourceecology ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
[root@opensourceecology ~]#

actually, it says it's updating 434 packages total. So I guess some dependencies got added to the 200-odd count before
ok, the update command finished in just under 4 minutes of wall time

...
real    3m56.410s
user    2m1.833s
sys     0m44.510s
[root@opensourceecology ~]#

post update info

[root@opensourceecology ~]# # log the post-state packages and versions
[root@opensourceecology ~]# time nice rpm -qa &> "${tmpDir}/after.log"

real    0m0.805s
user    0m0.769s
sys     0m0.036s
[root@opensourceecology ~]#
[root@opensourceecology ~]# time nice needs-restarting &> "${tmpDir}/needs-restarting.log"


real    0m8.156s
user    0m6.956s
sys     0m0.652s
[root@opensourceecology ~]# time nice needs-restarting -r &> "${tmpDir}/needs-reboot.log"

real    0m0.155s
user    0m0.104s
sys     0m0.051s
[root@opensourceecology ~]# 

[root@opensourceecology ~]# cat /var/tmp/CHG-2024-07-26_yum_update/needs-reboot.log 
Core libraries or services have been updated:
  systemd -> 219-78.el7_9.9
  dbus -> 1:1.10.24-15.el7
  openssl-libs -> 1:1.0.2k-26.el7_9
  linux-firmware -> 20200421-83.git78c0348.el7_9
  kernel -> 3.10.0-1160.119.1.el7
  glibc -> 2.17-326.el7_9.3

Reboot is required to ensure that your system benefits from these updates.

More information:
https://access.redhat.com/solutions/27943
[root@opensourceecology ~]# 

[root@opensourceecology ~]# cat /var/tmp/CHG-2024-07-26_yum_update/needs-restarting.log 
30842 : /usr/lib/systemd/systemd-udevd 
13696 : sshd: maltfield@pts/0
27401 : /bin/bash 
744 : /sbin/auditd 
19086 : /bin/bash 
13692 : sshd: maltfield [priv]
30672 : smtpd -n smtp -t inet -u 
13699 : -bash 
18035 : su - 
27436 : less /root/backups/backup.sh 
18036 : -bash 
18030 : sudo su - 
1484 : /var/ossec/bin/ossec-analysisd 
24493 : /bin/bash 
21581 : su - 
21580 : sudo su - 
21582 : -bash 
797 : /usr/lib/systemd/systemd-logind 
24476 : /bin/bash 
1830 : qmgr -l -t unix -u 
30673 : proxymap -t unix -u 
19119 : sudo su - 
24511 : /bin/bash 
29833 : local -t unix 
27417 : sudo su - 
19130 : -bash 
1 : /usr/lib/systemd/systemd --system --deserialize 23 
29830 : cleanup -z -t unix -u 
1500 : /var/ossec/bin/ossec-logcollector 
24475 : SCREEN -S upgrade 
2150 : /usr/sbin/varnishd -P /var/run/varnish.pid -f /etc/varnish/default.vcl -a 127.0.0.1:6081 -T 127.0.0.1:6082 -S /etc/varnish/secret -u varnish -g varnish -s malloc,40G 
2152 : /usr/sbin/varnishd -P /var/run/varnish.pid -f /etc/varnish/default.vcl -a 127.0.0.1:6081 -T 127.0.0.1:6082 -S /etc/varnish/secret -u varnish -g varnish -s malloc,40G 
29835 : bounce -z -t unix -u 
775 : /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation 
27419 : -bash 
585 : /usr/lib/systemd/systemd-journald 
771 : /usr/sbin/irqbalance --foreground 
770 : /usr/sbin/acpid 
1170 : /sbin/agetty --noclear tty1 linux 
30690 : smtp -t unix -u 
778 : /usr/sbin/chronyd 
8695 : gpg-agent --daemon --use-standard-socket 
24529 : /bin/bash 
2121 : /var/ossec/bin/ossec-syscheckd 
1806 : /usr/libexec/postfix/master -w 
19129 : su - 
19065 : /bin/bash 
2124 : /var/ossec/bin/ossec-monitord 
29832 : trivial-rewrite -n rewrite -t unix -u 
19044 : /bin/bash 
30693 : smtp -t unix -u 
30692 : smtp -t unix -u 
30691 : cleanup -z -t unix -u 
27418 : su - 
1475 : /var/ossec/bin/ossec-execd 
19025 : /bin/bash 
19024 : SCREEN -S CHG-2024-07-26_yum_update 
1458 : /var/ossec/bin/ossec-maild 
19023 : screen -S CHG-2024-07-26_yum_update 
[root@opensourceecology ~]#

alright, time to reboot

[root@opensourceecology ~]# reboot
Connection to opensourceecology.org closed by remote host.
Connection to opensourceecology.org closed.
user@personal:~$

system came back in about 1 minute
first attempt to load the wiki resulted in a 503 "Error 503 Backend fetch failed" from varnish
it's not just warming-up; apache didn't come-up on start

[root@opensourceecology ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2024-07-26 17:09:47 UTC; 2min 7s ago
	 Docs: man:httpd(8)
		   man:apachectl(8)
  Process: 1094 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE)
 Main PID: 1094 (code=exited, status=1/FAILURE)

Jul 26 17:09:47 opensourceecology.org systemd[1]: Starting The Apache HTTP Se....
Jul 26 17:09:47 opensourceecology.org httpd[1094]: (98)Address already in use:...
Jul 26 17:09:47 opensourceecology.org httpd[1094]: (98)Address already in use:...
Jul 26 17:09:47 opensourceecology.org httpd[1094]: no listening sockets availa...
Jul 26 17:09:47 opensourceecology.org httpd[1094]: AH00015: Unable to open logs
Jul 26 17:09:47 opensourceecology.org systemd[1]: httpd.service: main process...E
Jul 26 17:09:47 opensourceecology.org systemd[1]: Failed to start The Apache ....
Jul 26 17:09:47 opensourceecology.org systemd[1]: Unit httpd.service entered ....
Jul 26 17:09:47 opensourceecology.org systemd[1]: httpd.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
[root@opensourceecology ~]#

it says that the port is already in-use

[root@opensourceecology ~]# journalctl -u httpd --no-pager
-- Logs begin at Fri 2024-07-26 17:09:34 UTC, end at Fri 2024-07-26 17:15:26 UTC. --
Jul 26 17:09:47 opensourceecology.org systemd[1]: Starting The Apache HTTP Server...
Jul 26 17:09:47 opensourceecology.org httpd[1094]: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:443
Jul 26 17:09:47 opensourceecology.org httpd[1094]: (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:443
Jul 26 17:09:47 opensourceecology.org httpd[1094]: no listening sockets available, shutting down
Jul 26 17:09:47 opensourceecology.org httpd[1094]: AH00015: Unable to open logs
Jul 26 17:09:47 opensourceecology.org systemd[1]: httpd.service: main process exited, code=exited, status=1/FAILURE
Jul 26 17:09:47 opensourceecology.org systemd[1]: Failed to start The Apache HTTP Server.
Jul 26 17:09:47 opensourceecology.org systemd[1]: Unit httpd.service entered failed state.
Jul 26 17:09:47 opensourceecology.org systemd[1]: httpd.service failed.
[root@opensourceecology ~]#

before I start making changes, I'm going to initiate another backup (and wait at-least 30 minutes for the tar to finish)

I'm going to want to diff the apache configs, so I made a copy of the backup that I made just before the updates and copied it into the temp CHG dir

[root@opensourceecology CHG-2024-07-26_yum_update]# mkdir backup_before
[root@opensourceecology CHG-2024-07-26_yum_update]# rsync -av --progress /home/b2user/sync.old/daily_hetzner2_20240726_160837.tar.gpg backup_before/
sending incremental file list
daily_hetzner2_20240726_160837.tar.gpg
 20,622,312,871 100%  127.14MB/s    0:02:34 (xfr#1, to-chk=0/1)

sent 20,627,347,744 bytes  received 35 bytes  133,510,341.61 bytes/sec
total size is 20,622,312,871  speedup is 1.00
[root@opensourceecology CHG-2024-07-26_yum_update]#

well, unfortunately the wiki being down means I can't reference our docs on how to restore backups, but I managed to figure it out

[root@opensourceecology backup_before]# gpg --batch --passphrase-file /root/backups/ose-backups-cron.key --decrypt daily_hetzner2_20240726_160837.tar.gpg > daily_hetzner2_20240726_160837.tar
gpg: AES256 encrypted data
gpg: encrypted with 1 passphrase
[root@opensourceecology backup_before]# 

[root@opensourceecology backup_before]# du -sh *
20G     daily_hetzner2_20240726_160837.tar
20G     daily_hetzner2_20240726_160837.tar.gpg
[root@opensourceecology backup_before]# 

[root@opensourceecology backup_before]# du -sh *
20G     daily_hetzner2_20240726_160837.tar
20G     daily_hetzner2_20240726_160837.tar.gpg
[root@opensourceecology backup_before]# 

[root@opensourceecology backup_before]# tar -xf daily_hetzner2_20240726_160837.tar
[root@opensourceecology backup_before]# 

[root@opensourceecology backup_before]# du -sh *
20G     daily_hetzner2_20240726_160837.tar
20G     daily_hetzner2_20240726_160837.tar.gpg
20G     root
[root@opensourceecology backup_before]# rm -f daily_hetzner2_20240726_160837.tar.gpg 
[root@opensourceecology backup_before]#

[root@opensourceecology backup_before]# du -sh *
20G     daily_hetzner2_20240726_160837.tar
20G     root
[root@opensourceecology backup_before]#

to make this easier for the next person, I created a README directly in the backups dir

[root@opensourceecology backups]# cat /root/backups/README.txt 
2024-07-26
==========

The process to restore from backups is documented on the wiki

 * https://wiki.opensourceecology.org/wiki/Backblaze#Restore_from_backups

Oh, the wiki is down and you need to restore from backups to restore the wiki? Don't worry, I got you.

All backups are stored on Backblaze B2. You can download them with rclone or just by logging into the Backblaze B2 WUI.

First decrypt the main wrapper tar with `gpg`

  gpg --batch --passphrase-file <path-to-symmetric-encrypton-private-key> --decrypt <path-to-encrypted-tarball> > <path-to-decrypted-tarball>

For example:

  gpg --batch --passphrase-file /root/backups/ose-backups-cron.key --decrypt daily_hetzner2_20240726_160837.tar.gpg > daily_hetzner2_20240726_160837.tar

Then you can untar the wrapper tarball and the compressed tarball inside of that. For example:

  tar -xf daily_hetzner2_20240726_160837.tar
  cd root/backups/sync/daily_hetzner2_20240726_160837/www/
  tar -xf www.20240726_160837.tar.gz
  head var/www/html/www.opensourceecology.org/htdocs/index.php

--Michael Altfield <https://michaelaltfield.net.>
[root@opensourceecology backups]#

and I was able to extract the www files from the backups prior to the update

[root@opensourceecology backup_before]# cd root/backups/sync/daily_hetzner2_20240726_160837/www/
[root@opensourceecology www]#

[root@opensourceecology www]# ls
www.20240726_160837.tar.gz
[root@opensourceecology www]#

[root@opensourceecology www]# tar -xf www.20240726_160837.tar.gz
[root@opensourceecology www]#

[root@opensourceecology www]# du -sh *
32G     var
19G     www.20240726_160837.tar.gz
[root@opensourceecology www]#

oh, actually I want the /ettc/ config file

[root@opensourceecology www]# cd ../etc
[root@opensourceecology etc]# 

[root@opensourceecology etc]# tar -xf etc.20240726_160837.tar.gz 
[root@opensourceecology etc]# 

[root@opensourceecology etc]# du -sh *
46M     etc
13M     etc.20240726_160837.tar.gz
[root@opensourceecology etc]#

a diff of the pre-update configs and the current configs shows 4x new files

[root@opensourceecology etc]# diff -ril etc/httpd /etc/httpd
diff: etc/httpd/logs: No such file or directory
diff: etc/httpd/modules: No such file or directory
Only in /etc/httpd/conf.d: autoindex.conf
Only in /etc/httpd/conf.d: ssl.conf
Only in /etc/httpd/conf.d: userdir.conf
Only in /etc/httpd/conf.d: welcome.conf
[root@opensourceecology etc]#

I just moved these 4x files out (into our tmp change dir), and tried a restart; it came up

[root@opensourceecology CHG-2024-07-26_yum_update]# mkdir moved_from_etc_httpd
[root@opensourceecology CHG-2024-07-26_yum_update]# mv /etc/httpd/conf.d/autoindex.conf moved_from_etc_httpd/
[root@opensourceecology CHG-2024-07-26_yum_update]# mv /etc/httpd/conf.d/ssl.conf moved_from_etc_httpd/
[root@opensourceecology CHG-2024-07-26_yum_update]# mv /etc/httpd/conf.d/userdir.conf moved_from_etc_httpd/
[root@opensourceecology CHG-2024-07-26_yum_update]# mv /etc/httpd/conf.d/welcome.conf moved_from_etc_httpd/
[root@opensourceecology CHG-2024-07-26_yum_update]# 

[root@opensourceecology CHG-2024-07-26_yum_update]# systemctl restart httpd
[root@opensourceecology CHG-2024-07-26_yum_update]#

[root@opensourceecology CHG-2024-07-26_yum_update]# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2024-07-26 17:59:36 UTC; 4s ago
	 Docs: man:httpd(8)
		   man:apachectl(8)
 Main PID: 15908 (httpd)
   Status: "Processing requests..."
   CGroup: /system.slice/httpd.service
		   ├─15908 /usr/sbin/httpd -DFOREGROUND
		   ├─15910 /usr/sbin/httpd -DFOREGROUND
		   ├─15911 /usr/sbin/httpd -DFOREGROUND
		   ├─15912 /usr/sbin/httpd -DFOREGROUND
		   ├─15913 /usr/sbin/httpd -DFOREGROUND
		   ├─15914 /usr/sbin/httpd -DFOREGROUND
		   ├─15921 /usr/sbin/httpd -DFOREGROUND
		   ├─15927 /usr/sbin/httpd -DFOREGROUND
		   ├─15928 /usr/sbin/httpd -DFOREGROUND
		   ├─15936 /usr/sbin/httpd -DFOREGROUND
		   ├─15937 /usr/sbin/httpd -DFOREGROUND
		   ├─15938 /usr/sbin/httpd -DFOREGROUND
		   └─15939 /usr/sbin/httpd -DFOREGROUND

Jul 26 17:59:36 opensourceecology.org systemd[1]: Starting The Apache HTTP Se....
Jul 26 17:59:36 opensourceecology.org systemd[1]: Started The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.
[root@opensourceecology CHG-2024-07-26_yum_update]#

I was able to load and edit the wiki; I spent some time adding some updates to the CHG article https://wiki.opensourceecology.org/wiki/CHG-2020-05-04_yum_update
for some reason my browser keeps locking-up when all I'm trying to do is edit the text in the textarea for ^ this wiki article. I don't use the wysiwyg editor. I'm literally just editing text in a textarea; that shouldn't require any processing
It took me ~20 minutes just to make a few changes to one wiki article because the page on firefox kept locking-up, sometimes displaying a spinning circle over the page
I launched a new DispVM with *only* firefox running and *only* one tab open in firefox. The issue persisted, and the VM with the (idle) firefox on the edit page was taxed with 20-60% CPU usage; something is definitely wrong, but it's unclear if the bug is on our mediawiki server, my firefox client, or both
anyway, I'm continuing with the validation steps
I was successfully able to load the frontpage of all the 9x websites
1. the logo at the top (and bottom) of https://oswh.opensourceecology.org/ was missing, but I'm not sure if that was the case before the updates or not
  1. I simply get a 404 on the image http://www.opensourcewarehouse.org/wp-content/uploads/2013/02/headfooter-logonew.png
  2. I guess the domain is wrong; we don't appear to use opensourcewarehouse.org anymore, so I guess this was an issue that predates our updates now
2. everything else looked good
I logged into the munin. It loads fine
1. I do see some gaps in mysql charts where everything drops to 0 for a few hours, which I guess is before/why Marcin did reboots again. My job now isn't to investigate this now, but I'm just making a note here
2. otherwise munin is working; validated.
I logged into awstates. it loads fine
1. I just quickly scanned the main pages for www.opensourceecology.org and wiki.opensourceecology.org; they look fine
I already tested edits on wiki.opensourceecolgy.org; they're working (setting aside the client-side lag)
I was successfully able to make a trivial change to the main wordpress site, and then revert that change https://www.opensourceecology.org/offline-wiki-zim-kiwix/
the only thing left is the backups, which have been running the background since shortly after the reboot
1. the backups finished being created successfully
2. the backups are currently being uploaded at the rate-limited 3 MB/s. they're at 39% now, and estimated to finish uploading in 1h10m from now.
3. the upload is the last step; that's good enough for me to consider the backups functional
that completes our validation; I think it's safe to mark this change as successful
I sent an update email to Marcin & Catarina

Hey Marcin & Catarina,

I've finished updating the system packages on the hetzner2 server.

It's a very good thing that we did this, because your server tried and failed to download its updates from 484 mirrors before it finally found a server that it could download its updates from.

As I mentioned in Nov 2022, your server runs CentOS 7, which stopped receiving "Full Updates" by Red Hat in Aug 2020. As of Jun 2024, it is no longer going to be updated in any way (security, maintenance, etc). At some point in the future, I guess all of their update servers will go down too. We're lucky at least one was still online.

* https://wiki.opensourceecology.org/wiki/Maltfield_Log/2022#Wed_November_02.2C_2022

Today I was successfully able to update 434 system packages onto hetzner2. I did some quick validation of a subset of your websites, and I only found a couple minor errors

1. The header & footer images of oswh don't load https://oswh.opensourceecology.org/
2. Editing the wiki sometimes causes my browser to lock-up; it's not clear if this is a pre-existing issue, or if the issue is caused by your server or my client

I did not update your server's applications that cannot be updated by the package manager (eg wordpress, mediawiki, etc). If you don't detect any issues with your server, then I would recommend that we do the application upgrade simultaneously with a migration to a new server running Debian.

I'd like to stress again the urgency of the need to migrate off of CentOS 7. Besides the obvious security risks of running a server that is no longer receiving security patches, at some point in the likely-not-too-distant future, your server is going to break and it will be extremely non-trivial to fix it. The deadline for migrating was in 2020. I highly recommend prioritizing a project to migrate your server to a new Debian server ASAP.

Please spend some time testing your various websites, and let me know if you experience any issues.

Thank you,

Michael Altfield
Senior Technology Advisor
PGP Fingerprint: 8A4B 0AF8 162F 3B6A 79B7 70D2 AA3E DF71 60E2 D97B

Open Source Ecology
www.opensourceecology.org

I confirmed the list of updates on the server is now empty

[root@opensourceecology CHG-2024-07-26_yum_update]# yum list updates
Loaded plugins: fastestmirror, replace
Loading mirror speeds from cached hostfile
 * base: ftp.plusline.net
 * epel: mirrors.n-ix.net
 * extras: ftp.plusline.net
 * updates: mirror.checkdomain.de
[root@opensourceecology CHG-2024-07-26_yum_update]#

I'm considering the change successful
looks like my tmp change dir pushed the disk to 86% capacity; let's clean that up

[root@opensourceecology CHG-2024-07-26_yum_update]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         32G     0   32G   0% /dev
tmpfs            32G     0   32G   0% /dev/shm
tmpfs            32G   17M   32G   1% /run
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/md2        197G  161G   27G  86% /
/dev/md1        488M  386M   77M  84% /boot
tmpfs           6.3G     0  6.3G   0% /run/user/1005
[root@opensourceecology CHG-2024-07-26_yum_update]# ls
after.log      before.log            needs-reboot.log      update.log
backup_before  moved_from_etc_httpd  needs-restarting.log
[root@opensourceecology CHG-2024-07-26_yum_update]# du -sh *
28K     after.log
70G     backup_before
28K     before.log
28K     moved_from_etc_httpd
4.0K    needs-reboot.log
4.0K    needs-restarting.log
216K    update.log
[root@opensourceecology CHG-2024-07-26_yum_update]# ls before.log 
before.log
[root@opensourceecology CHG-2024-07-26_yum_update]#

[root@opensourceecology CHG-2024-07-26_yum_update]# ls backup_before/
daily_hetzner2_20240726_160837.tar  root
[root@opensourceecology CHG-2024-07-26_yum_update]#

[root@opensourceecology CHG-2024-07-26_yum_update]# du -sh backup_before/*
20G     backup_before/daily_hetzner2_20240726_160837.tar
51G     backup_before/root
[root@opensourceecology CHG-2024-07-26_yum_update]#

[root@opensourceecology CHG-2024-07-26_yum_update]# ls backup_before/root
backups
[root@opensourceecology CHG-2024-07-26_yum_update]#

[root@opensourceecology CHG-2024-07-26_yum_update]# rm -rf backup_before/root
[root@opensourceecology CHG-2024-07-26_yum_update]#

great, now we're down to 59%

[root@opensourceecology CHG-2024-07-26_yum_update]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         32G     0   32G   0% /dev
tmpfs            32G     0   32G   0% /dev/shm
tmpfs            32G   17M   32G   1% /run
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/md2        197G  110G   78G  59% /
/dev/md1        488M  386M   77M  84% /boot
tmpfs           6.3G     0  6.3G   0% /run/user/1005
[root@opensourceecology CHG-2024-07-26_yum_update]#

Wed July 24, 2024

Marcin contacted me a few days ago saying that the server needs reboots again
I found that the last time we did a system packages update was in 2020, over 4 years ago. I strongly recommended that we update the system packages, and probably the web applications as well
1. here's the link to the last time I updated the system packages in May 2020 https://wiki.opensourceecology.org/wiki/CHG-2020-05-04_yum_update
I also noted that CentOS is now not only EOL, but it's also no longer receiving (security) updates.
1. I warned Marcin about this approaching deadline in Nov 2022, and urged him to migrate to a new OS before 2024.

  In my prior work at OSE, I've done my best to design your systems to be robust and "well oiled" so that they would run for as long as possible with as little maintenance as possible. However, code rots over time, and there's only so long you can hold-off before things fall apart.

  Python 2.7.5 was End-of-Life'd on 2020-01-01, and it no longer receives any updates.

   * https://en.wikipedia.org/wiki/History_of_Python

  CentOS 7.7 was released 2019-09-17. "Full Updates" stopped 2020-08-06, and it will no longer receive any maintenance updates after 2024-06-30.

   * https://wiki.centos.org/About/Product

  At some point, you're going to want to migrate to a new server with a new OS. I strongly recommend initiating this project before 2024.

1. Here's the log entry https://wiki.opensourceecology.org/wiki/Maltfield_Log/2022#Wed_November_02.2C_2022
2. I told Marcin to budget for ~$10,000 to migrate to a new server, as it's going to be a massive project that will likely require more than a month of full-time work to complete the migration
Marcin said I should go ahead and prepare a CHG "ticket" article for the upgrade and schedule a time to do it
I prepared a change ticket for updating the system packages on Friday https://wiki.opensourceecology.org/wiki/CHG-2024-07-26_yum_update
I also noticed that I kept getting de-auth'd every few minutes on the wiki. That's annoying. Hopefully updates will help this (and other) issues go away.
If we did a migration to debian, then we'd need to migrate to a new server
1. previously when we migrated from hetzner1 to hetzner2, we got a 15x increase in RAM (from 4GB to 64GB). And the price of both servers was the same!
2. I was expecting the next jump would have similar results: we'd migrate to a new server that costs the same for much better specs, but that's not looking like it's going to be the case :(
3. Here's the currently-offered dedicated servers at hetzner https://www.hetzner.com/dedicated-rootserver/
4. Currently we have 8-cores, 64G RAM, and two 250G disks in a RAID-1 software array. We pay 39 EUR/mo
5. The cheapest dedicated server (EX44) currently is 46.41 EUR/month and comes with 14-cores, 64G RAM, and 2x 512G disks. That should meet our requirements https://www.hetzner.com/dedicated-rootserver/ex44/configurator/#/
  1. oh crap, we'd be downgrading the proc from the i7 (Intel® Core™ i7-6700) to an i5 (Intel® Core™ i5-13500)
  2. I'd have to check the munin charts, but I would be surprised if we ever break a load of 2, so that's still probably fine.
I met with Marcin tonight to discuss [a] the system-level package upgrades, [b] the application (eg wordpress, mediawiki, etc) upgrades, and [c] the server migration
1. I recommended that Marcin do the updates on staging, and described the risk of not doing it
  1. the problem is that the current staging environment is down, an d it may take a few days to restore it
  2. the risk is maybe a few days of downtime instead smaller change window during the update
2. we agreed that I'll do the system-level package upgrades direct-to-production; Marcin accepted the risk of a few days of downtime
3. Marcin also mentioned that Hetzner has a "server auction" page, which has some more servers that meet our needs at a slightly discounted price https://www.hetzner.com/sb/
  1. actually many of these are 37.72 EUR/mo, so they're actually *cheaper* than our current 39 EUR/mo. Great!
  2. there's >3 pages of servers for this 37.72 EUR/mo price. One of them has 2x 4TB drives (though it looks like spinning disks). This is a server graveyard built-to-spec for previous customers, it seems. We should be able to find one that meets our needs, so that means we'll easily double our disk and save ~15 EUR per year. Cool :)

Maltfield Log/2024 Q3

Contents

See Also

Wed July 31, 2024

Tue July 30, 2024

Fri July 26, 2024

Wed July 24, 2024

Navigation menu

Maltfield Log/2024 Q3

See Also

Wed July 31, 2024

Tue July 30, 2024

Fri July 26, 2024

Wed July 24, 2024

Navigation menu

Search