Maltfield Log/2024 Q3
Jump to navigation
Jump to search
My work log from the third quarter of the year 2024. I intentionally made this verbose to make future admin's work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.
See Also
Wed July 31, 2024
- This morning I woke-up to some emails from Hetzner indicating that our Hetzner3 order is finished, and responding to my questions
>1. By default, is the two disks configured in a RAID1 array? Servers from server auction are without pre-installed OS, so that no Software-Raid is pre-configured. >2. Do we have any other RAID options? The server have two disks, so that you could install an Linux OS via installimage script. There you would have the option to install it without raid or raid1 or raid0 is possible with two disks. Please find information about installimagescript here: https://docs.hetzner.com/robot/dedicated-server/operating-systems/installimage/ >3. How many additional (empty/unused) disk slots does this dedicated >server have? What options would we have for adding additional disks to >this machine in the future, if needed You can add one additional NVMe or up to two additional sata SSD or sata HDD. >4. We ordered this because it has "M.2 NVME disks" as opposed to the >"SSD" disks. Can you confirm that the NVME disks are faster than "SSD" >disks? If not, please cancel our order and we'll purchase another >machine with 3x 512G SSD disks Usually NVMe SSD are faster than sata ssd. >5. Currently we have another dedicated server with "2 x 250 GB SATA 6 >Gb/s SSD". Can you please tell us the "Gb/s" throughput for this >server's disks? Unfortunately I don't have information about it. You should test it yourself on your server.
- so it looks like we did well snagging the auction for the last-available, lowest-price with "2x SSD M.2 NVMe 512 GB"
- I checked their server auction page again, and I do still see one server available at their lowest 37.72 EUR/mo price with 2x 512G NVMe disks, so I guess one listing doesn't necessarily mean that there's only one server available.
- after migration, we should end-up with a 25% full disk. If we ~triple our current disk usage before we retire this server, we have the ability to add two more non-NVMe SATA SSD disks in another RAID1, which we can partition-up as-needed for our backups, tmp files, etc (hopefully we can keep www and DB on the faster NVMe disks)
- their hardware page has info on the addon SSD disks that they lease, and their prices https://docs.hetzner.com/robot/dedicated-server/general-information/root-server-hardware/#drives
- it looks like their cheapest non-NVMe SSD is 8.50 EUR/mo for a 1T disk. Of course, we would need two for the RAID, so that means we can 4x our disk space in the future for an additional 17 EUR / mo. They also have a 3.84 TB SATA SSD for 37 EUR/mo, and they have both 16T and 22T SATA HDDs for 20.50 and 27.00 EUR/mo, respectfully.
- we also get a free 100GB "storage box" along with our purchase, which I guess is some external NFS mount. It's probably slow as hell, but we *could* use it for something like our two local copies of encrypted backup data. We also have this as part of our hetzner2 plan, but we don't use it.
- the free 100G storage box is named "BX10". We can even increase this to a BX11 (1T @ 3.20 EUR/mo), BX21 (5T @ 10.90 EUR/mo), BX31 (20.80 EUR/mo), or BX41 (40.60 EUR/mo)
- this is my first time setting-up a hetzner dedicated server (I inherited both hetzner1 & hetzner2)
- I was hoping for something like an in-browser KVM/VNC like SolusVM offers. Or to feed it 'cloudinit' like hetnzer cloud offers, but I don't see that as an option
- looks like they have scripts for installing a few disros. docs here https://docs.hetzner.com/robot/dedicated-server/operating-systems/installimage/
- and here's the general docs on their dedicated servers https://docs.hetzner.com/robot/dedicated-server/getting-started/root-server-guide/
- ok, they do offer a KVM-over-IP for installing custom distros. Apparently a technician has to physically plug-in to the machine, and you're given 3 free hours. After that it's 8.40 EUR/hr https://docs.hetzner.com/robot/dedicated-server/maintainance/kvm-console/
- the new server is now listed on our "Hetzner Robot" Server Page https://robot.hetzner.com/server
- the old server is listed as "EX41S-SSD #XXXXXX"
- the new server is listed more simply as "Server Auction #XXXXXXX"
- I sent another support request to hetzner asking which type of hardware we have
Hi, I have another question about possible disk upgrades to our newly-purcahsed server "Server Auction #2443019". Can you please tell us what type of server we ordered? Is it an AX? DX? EX? GEX? PX? RX? SX? Or what? I ask because your documentation page on what disk upgrade options are available has a lot of caveats (eg "only for the following servers" or "not available for XYZ"), so to know what our options are I need to know which server type we have. * https://docs.hetzner.com/robot/dedicated-server/general-information/root-server-hardware/#drives Please let us know which server type we have, so that that I can figure out what disk upgrade options are available (and their prices). Thank you,
- I probably should have done this before, but I checked the cloud hetzner offerings
- I didn't check it before because I expected it to be memory bound. I do want to stick with 64G of RAM
- indeed, the cheapest server with 64G of RAM in the cloud is 95.99 EUR/mo — but that also gives us 16 dedicated cores (AMD) and 360 GB disk. And, of course, it's easier to upgrade. But too expensive
- as said above, we *could* probably get-by with 16G of RAM. That's 23.99 EUR/mo with dedicated vCPU. 32G is with dedicated vCPU is 47.99 EU/mo. With shared vCPU, we get 16G RAM for 15.90 EUR/mo or 32G for 31.90GB/mo. But we can't increase the RAM beyond 32GB/mo on the shared vCPU systems
- therefore, I do think going with the dedicated server is our best bet due to the value on the RAM that we get.
- I also asked hetzner sales about possible memory upgrades. I don't think we'll need more than 64G of RAM, but it would be good to know if upgrading is possible
Hi, I have another question about possible memory upgrades to our newly-purcahsed server "Server Auction #2443019". Can you please tell us if our current configuration with "4x RAM 16384 MB DDR4" is the maximum RAM that this system can accept? Does the server only have 4x RAM slots? Or are there some empty ones? Is it possible to increase one or more of the RAM slots with >16G RAM chips? Please let us know what options we have for future memory upgrades on this new dedicated server. Thank you,
- I tried to shut down the hetzner3 server (to eliminate it as a vector until I've hardened it), but there's only an option to reboot :(
- I gave hetzner my ssh public key at order-time, and — yep — it's setup with the root user by default :(
user@personal:~/tmp/ansible$ ssh root@144.76.164.201 Linux rescue 6.9.7 #1 SMP Thu Jun 27 15:07:37 UTC 2024 x86_64 -------------------- Welcome to the Hetzner Rescue System. This Rescue System is based on Debian GNU/Linux 12 (bookworm) with a custom kernel. You can install software like you would in a normal system. To install a new operating system from one of our prebuilt images, run 'installimage' and follow the instructions. Important note: Any data that was not written to the disks will be lost during a reboot. For additional information, check the following resources: Rescue System: https://docs.hetzner.com/robot/dedicated-server/troubleshooting/hetzner-rescue-system Installimage: https://docs.hetzner.com/robot/dedicated-server/operating-systems/installimage Install custom software: https://docs.hetzner.com/robot/dedicated-server/operating-systems/installing-custom-images other articles: https://docs.hetzner.com/robot -------------------- Rescue System (via Legacy/CSM) up since 2024-07-31 09:16 +02:00 Hardware data: CPU1: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (Cores 8) Memory: 64099 MB Disk /dev/nvme0n1: 512 GB (=> 476 GiB) doesn't contain a valid partition table Disk /dev/nvme1n1: 512 GB (=> 476 GiB) doesn't contain a valid partition table Total capacity 953 GiB with 2 Disks Network data: eth0 LINK: yes MAC: 90:1b:0e:c4:28:b4 IP: 144.76.164.201 IPv6: 2a01:4f8:200:40d7::2/64 Intel(R) PRO/1000 Network Driver root@rescue ~ #
- here's our disks info. So already only 476.9G. It'll be a bit less after we put a filesystem on it, I'm sure
root@rescue ~ # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS loop0 7:0 0 3.1G 1 loop nvme0n1 259:0 0 476.9G 0 disk nvme1n1 259:1 0 476.9G 0 disk root@rescue ~ #
- I'm reading through the guide on hetzner's `installimage` tool https://docs.hetzner.com/robot/dedicated-server/operating-systems/installimage/
- the guide suggests a couple commands to see if we have a hardware raid. I didn't think we did, and this appears to confirm it
root@rescue ~ # megacli -LDInfo -Lall -Aall Exit Code: 0x00 root@rescue ~ # root@rescue ~ # arcconf GETCONFIG 1 LD Controllers found: 0 Invalid controller number. root@rescue ~ #
- it appears we don't have any software RAIDs already setup either
root@rescue ~ # ls /dev/md* ls: cannot access '/dev/md*': No such file or directory root@rescue ~ #
- quick disk tests show that we're getting 2 Gb/s disk read. shit, that's slower than the advertised "6 Gb/s" on our prod server (though admittedly I never tested this)
root@rescue ~ # hdparm -Ttv /dev/nvme0n1 /dev/nvme0n1: readonly = 0 (off) readahead = 256 (on) geometry = 488386/64/32, sectors = 1000215216, start = 0 Timing cached reads: 35298 MB in 1.97 seconds = 17911.22 MB/sec Timing buffered disk reads: 6992 MB in 3.00 seconds = 2330.11 MB/sec root@rescue ~ #
- ok, I ran installimage
root@rescue ~ # installimage
- I selected "Debian"
- I selected "Debian-1205-bookworm-amd64-base"
- it dumped me into a midnight commander editor and said I could save with F10
- it said that "by default all disks are used for software raid" — that sounds good
- the "standard config" file that it gave me was too long to try to copy & paste, but the default RAID looked like what we wanted
- the default partition layout had a 32G swap, 1G /boot, and the rest allocated to '/'
- hetzner2 has a 488M /boot that's currently 84% full (using 386M). 386/1024 = 38% full, which is much better. That sounds good.
- hetzner2 has a 32G swap. It does get used, but it's currently using <1G. 32G should be fine.
- I thought about setting up an LVM. It would be a better idea than having everything on one disk, but it would inevitable require more maintenance. For the sake of keeping things simple for a non-profit that has no sysadmins on staff, I'm going to stick to "just allocate the rest to '/'"
- oh, cool, the config file said what our disks are (if it can be trusted). It says: SAMSUNG MZVLB512HAJQ
- looks like they're from 2017 https://ssd.userbenchmark.com/SpeedTest/401452/SAMSUNG-MZVLB512HAJQ-000L2
- samsung advertises them as having 3.5 Gbps sequental read + 2.9 Gbps sequential write + 460K random read iops + 500k random write iops
- I decided to accept all these defaults and proceed with the install
- oh, except I did change the hostname line
- ah shit, how do I send an F10 command over ssh? stupid midnight commander editor...
- I pressed Ctrl+[[ and that seemed to work
- the install finished in a few minutes
Hetzner Online GmbH - installimage Your server will be installed now, this will take some minutes You can abort at any time with CTRL+C ... : Reading configuration done : Loading image file variables done : Loading debian specific functions done 1/16 : Deleting partitions done 2/16 : Test partition size done 3/16 : Creating partitions and /etc/fstab done 4/16 : Creating software RAID level 1 done 5/16 : Formatting partitions : formatting /dev/md/0 with swap done : formatting /dev/md/1 with ext3 done : formatting /dev/md/2 with ext4 done 6/16 : Mounting partitions done 7/16 : Sync time via ntp done : Importing public key for image validation done 8/16 : Validating image before starting extraction done 9/16 : Extracting image (local) done 10/16 : Setting up network config done 11/16 : Executing additional commands : Setting hostname done : Generating new SSH keys done : Generating mdadm config done : Generating ramdisk done : Generating ntp config done 12/16 : Setting up miscellaneous files done 13/16 : Configuring authentication : Fetching SSH keys done : Disabling root password done : Disabling SSH root login with password done : Copying SSH keys done 14/16 : Installing bootloader grub done 15/16 : Running some debian specific functions done 16/16 : Clearing log files done INSTALLATION COMPLETE You can now reboot and log in to your new system with the same credentials that you used to log into the rescue system. root@rescue ~ #
- I have to say that I was happy that it generated new ssh keys and that it said it was verifying some public key and doing some image verification.
- oh, and they disabled root password and ssh login with password. that's better than I expected from them. good.
- I ran `shutdown -h now`, and the server didn't come back. That's actually good. I wanted to see if I could shut the thing down (the safest machine is a machine that's off, especially before hardening), but the hetzner robot WUI didn't give an option to shutdown (only to reboot).
- after waiting 5 minutes with no pongs to my pings, I logged into the hetzner robot wui -> server -> reset tab -> Execute an automatic hardware reset, and I clicked "Send" https://robot.hetzner.com/server
- after a few more minutes with still no pongs to my pings, I logged into the hetzner robot wui -> server -> WOL tab -> clicked "Send WOL signal to server"
- after about 2 minutes, I started getting pings and was able to ssh-in as the 'root' user
- as soon as I got a shell from ssh, I quickly pasted-in my "jumpstart" provisioning and hardening commands to create a user for me, do basic ssh hardening, and setup a basic firewall to block everything except ssh
adduser maltfield --disabled-password --gecos '' groupadd sshaccess gpasswd -a maltfield sshaccess mkdir /home/maltfield/.ssh/ echo "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDGNYjR7UKiJSAG/AbP+vlCBqNfQZ2yuSXfsEDuM7cEU8PQNJyuJnS7m0VcA48JRnpUpPYYCCB0fqtIEhpP+szpMg2LByfTtbU0vDBjzQD9mEfwZ0mzJsfzh1Nxe86l/d6h6FhxAqK+eG7ljYBElDhF4l2lgcMAl9TiSba0pcqqYBRsvJgQoAjlZOIeVEvM1lyfWfrmDaFK37jdUCBWq8QeJ98qpNDX4A76f9T5Y3q5EuSFkY0fcU+zwFxM71bGGlgmo5YsMMdSsW+89fSG0652/U4sjf4NTHCpuD0UaSPB876NJ7QzeDWtOgyBC4nhPpS8pgjsnl48QZuVm6FNDqbXr9bVk5BdntpBgps+gXdSL2j0/yRRayLXzps1LCdasMCBxCzK+lJYWGalw5dNaIDHBsEZiK55iwPp0W3lU9vXFO4oKNJGFgbhNmn+KAaW82NBwlTHo/tOlj2/VQD9uaK5YLhQqAJzIq0JuWZWFLUC2FJIIG0pJBIonNabANcN+vq+YJqjd+JXNZyTZ0mzuj3OAB/Z5zS6lT9azPfnEjpcOngFs46P7S/1hRIrSWCvZ8kfECpa8W+cTMus4rpCd40d1tVKzJA/n0MGJjEs2q4cK6lC08pXxq9zAyt7PMl94PHse2uzDFhrhh7d0ManxNZE+I5/IPWOnG1PJsDlOe4Yqw== maltfield@ose" > /home/maltfield/.ssh/authorized_keys chown -R maltfield:maltfield /home/maltfield/.ssh chmod -R 0600 /home/maltfield/.ssh chmod 0700 /home/maltfield/.ssh # without this, apt-get may get stuck export DEBIAN_FRONTEND=noninteractive apt-get update apt-get -y install iptables iptables-persistent apt-get -y purge nftables update-alternatives --set iptables /usr/sbin/iptables-legacy update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy update-alternatives --set arptables /usr/sbin/arptables-legacy update-alternatives --set ebtables /usr/sbin/ebtables-legacy iptables -A INPUT -i lo -j ACCEPT iptables -A INPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -j DROP iptables -A INPUT -p icmp -j ACCEPT iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT iptables -A INPUT -j DROP iptables -A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT iptables -A OUTPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -j ACCEPT iptables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT iptables -A OUTPUT -m owner --uid-owner 42 -j ACCEPT iptables -A OUTPUT -m owner --uid-owner 1000 -j ACCEPT iptables -A OUTPUT -m limit --limit 5/min -j LOG --log-prefix "iptables denied: " --log-level 7 iptables -A OUTPUT -j DROP ip6tables -A INPUT -i lo -j ACCEPT ip6tables -A INPUT -s ::1/128 -d ::1/128 -j DROP ip6tables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT ip6tables -A INPUT -j DROP ip6tables -A OUTPUT -s ::1/128 -d ::1/128 -j ACCEPT ip6tables -A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT ip6tables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT ip6tables -A OUTPUT -m owner --uid-owner 42 -j ACCEPT ip6tables -A OUTPUT -m owner --uid-owner 1000 -j ACCEPT ip6tables -A OUTPUT -j DROP iptables-save > /etc/iptables/rules.v4 ip6tables-save > /etc/iptables/rules.v6 cp /etc/ssh/sshd_config /etc/ssh/sshd_config.orig.`date "+%Y%m%d_%H%M%S"` grep 'Port 32415' /etc/ssh/sshd_config || echo 'Port 32415' >> /etc/ssh/sshd_config grep 'AllowGroups sshaccess' /etc/ssh/sshd_config || echo 'AllowGroups sshaccess' >> /etc/ssh/sshd_config grep 'PermitRootLogin no' /etc/ssh/sshd_config || echo 'PermitRootLogin no' >> /etc/ssh/sshd_config grep 'PasswordAuthentication no' /etc/ssh/sshd_config || echo 'PasswordAuthentication no' >> /etc/ssh/sshd_config systemctl restart sshd.service apt-get -y upgrade
- I added this new entry to my ose VM's /home/user/.ssh/config
Host hetzner3 Hostname 144.76.164.201 Port 32415 ForwardAgent yes User maltfield
- I then gave my user sudo permission
root@mail ~ # cp /etc/sudoers /etc/sudoers.20240731.orig root@mail ~ # root@mail ~ # visudo root@mail ~ # root@mail ~ # diff /etc/sudoers.20240731.orig /etc/sudoers 47a48 > maltfield ALL=(ALL:ALL) NOPASSWD:ALL root@mail ~ #
- alright, basic hardening is done.
- That's all I really wanted to achieve for now. Next I'd like to prepare some ansible playbooks to setup the rest of the basic hardening
- for now, I just want to leave this machine off in the meantime
- I attempted to shut it down again
- I left a ping open for 147 minutes, and I never got a pong back. So I'd say it's off. Great!
- it appears that I just have to trigger a WOL on hetzner robot WUI to turn it back on, which I'll do after I spend some time working on the ansible roles playbooks
Tue July 30, 2024
- Marcin gave me the go-ahead to order a "hetzner3" server and begin provisioning it with Debian in preparation to migrate all our sites from the CentOS7 hetzner2 server to this new server
- This is going to be an enormous project. When I did the hetzner1 -> hetzner2 migration, I inherited both systems (in 2017). For some reason the websites were split across both servers (plus dreamhost too iirc?). but I consolidated everything onto "hetzner2" and canceled "hetzner1" in 2018 https://wiki.opensourceecology.org/index.php?title=OSE_Server&oldid=298909#Assessment_of_Server_Options
- I'll be using ansible to assist in provisioning this server (and hopefully make it easier to provision future servers). Marcin expressed interest in lowering this barrier for others, as well.
- I noticed that 5 years ago I created a repo for OSE's ansible playbooks, but it's empty.
- I just added the LICENSE to this repo, and I plan to use it to publish our ansible roles/playbooks
- First thing I need to do is decide which server to buy from hetzner's dedicated serer offerings
- holy crap, not only are their server auctions *much* cheeper per month, they also don't have a one-time-setup fee (usually ~$50-$200?)
- I've written pretty extensively in the past about what specs I'd be looking to get in a future OSE Server migration https://wiki.opensourceecology.org/index.php?title=OSE_Server&oldid=298909#OSE_Server_and_Server_Requirements
- In 2018, I said we'd want min 2-4 cores
- In 2018, I said we'd want min 8-16 G RAM
- In 2018, I said we'd want min ~200G disk
- Honestly, I expect that the lowest offerings of a dedicated server in 2024 are probably going to suffice for us, but what I'm mostly concerned-about is the disk.
- even last week when I did the yum updates, I nearly filled the disk just by extracting a copy of our backups. Currently we have two 250G disks in a software RAID-1 (mirror) array. That give us a useable 197G
- it's also provisioned with all the data on '/'. It would be smart if we setup an LVM
- It's important to me that we double this at-least, but I'll see if there's any deals on 1TB disks or larger
- also what we currently have is a 6 Gb/s SSD, so I don't want to downgrade that by going to a spinning-disk HDD. NvME might be a welcome upgrade. I/O wait is probably a bottleneck, but not currently one that's causing us agony
- I spent some time reviewing the munin graphs
- load rarely ever touches 3. Most of the time it hovers between 0.2 - 1. So I agree that 4 cores is fine for us now.
- most of these auctions have a Intel Core i7-4770, which is a 4-core + 8 thread proc. That should be fine.
- somehow our varnish hits are way down. They used to average >80%, but currently they're down to 28-44%
- load rarely ever touches 3. Most of the time it hovers between 0.2 - 1. So I agree that 4 cores is fine for us now.
- I documented these charts and my findings on a new Hetzner3 page
- I looked through the listings in the server auctions
- I don't want one that's only 32G RAM (few of these are)
- It looks like some have "2 x SSD SATA 250 GB" and some have "2 x SSD M.2 NVMe 512 GB". If we can, let's get the NVMe disks with better io
- there is one with "2 x HDD SATA 2,0 TB Enterprise". More space would be nice, but not at the sacrifice of io
- questions I have for hetzner:
- how many disk slots are there? Can we add more disks in the future?
- by default, do all these systems have RAID-1? Do we have other RAID options?
- oh, actually, there was only one server available for less than 38 EUR/mo that had the 2x 512GB NVME
- I went ahead and ordered it
- I also sent a separate message to hetzner sales asking them for detailed info about the different read & write speeds of their HDD, SSD, and NVME offerings in dedicated servers
- I sent an email to Marcin
Hey Marcin, I just ordered a dedicated server from Hetzner with the following specs: * Intel Core i7-6700 * 2x SSD M.2 NVMe 512 GB * 4x RAM 16384 MB DDR4 * NIC 1 Gbit Intel I219-LM * Location: Germany, FSN1 * Rescue system (English) * 1 x Primary IPv4 While they had plenty of servers available with the i7-6700 and 16G of RAM, they only had one with 2x 512 GB NVMe disks (the others were just "SSD" disks). Those NVMe disks should give us a performance boost, so I snagged it while it was available. I did some reviews of our munin charts to determine our hetzner3 server's needs. For more info, see * https://wiki.opensourceecology.org/wiki/Hetzner3 Please let me know if you have any questions about this server. Thank you, Michael Altfield Senior Technology Advisor PGP Fingerprint: 8A4B 0AF8 162F 3B6A 79B7 70D2 AA3E DF71 60E2 D97B Open Source Ecology www.opensourceecology.org
Fri July 26, 2024
- I started the CHG-2024-07-26_yum_update today at 11:00
- pre-state proof shows we have lots of outdated system packages, as expected
[root@opensourceecology ~]# yum list updates ... xz-libs.x86_64 5.2.2-2.el7_9 updates yum.noarch 3.4.3-168.el7.centos base yum-cron.noarch 3.4.3-168.el7.centos base yum-plugin-fastestmirror.noarch 1.1.31-54.el7_8 base yum-utils.noarch 1.1.31-54.el7_8 base zlib.x86_64 1.2.7-21.el7_9 updates [root@opensourceecology ~]#
- I tried to check the backups log, but it was empty :/
[root@opensourceecology ~]# cat /var/log/backups/backup.log [root@opensourceecology ~]#
- ok, looks like it rotated already; this file shows a 20.424G backup file successfully uploaded to backblaze with rclone
[root@opensourceecology ~]# ls /var/log/backups/ backup.lo backup.log-20240628.gz backup.log-20240714.gz backup.log backup.log-20240629.gz backup.log-20240715.gz backup.log-20240615.gz backup.log-20240701.gz backup.log-20240716.gz backup.log-20240616.gz backup.log-20240702.gz backup.log-20240718.gz backup.log-20240617.gz backup.log-20240704.gz backup.log-20240719.gz backup.log-20240619.gz backup.log-20240706.gz backup.log-20240721.gz backup.log-20240621.gz backup.log-20240707.gz backup.log-20240722.gz backup.log-20240622.gz backup.log-20240708.gz backup.log-20240724.gz backup.log-20240623.gz backup.log-20240709.gz backup.log-20240725.gz backup.log-20240625.gz backup.log-20240711.gz backup.log-20240726 backup.log-20240626.gz backup.log-20240712.gz backup.log-20240627.gz backup.log-20240713.gz [root@opensourceecology ~]# [root@opensourceecology ~]# tail -n20 /var/log/backups/backup.log-20240726 * daily_hetzner2_20240726_072001.tar.gpg:100% /20.424G, 2.935M/s, - 2024/07/26 09:50:31 INFO : daily_hetzner2_20240726_072001.tar.gpg: Copied (new) 2024/07/26 09:50:31 INFO : Transferred: 20.424G / 20.424 GBytes, 100%, 2.979 MBytes/s, ETA 0s Transferred: 1 / 1, 100% Elapsed time: 1h57m0.8s real 117m1.219s user 4m20.240s sys 2m9.432s + echo ================================================================================ ================================================================================ ++ date -u +%Y%m%d_%H%M%S + echo 'INFO: Finished Backup Run at 20240726_095031' INFO: Finished Backup Run at 20240726_095031 + echo ================================================================================ ================================================================================ + exit 0 [root@opensourceecology ~]#
- the query of b2 backup files also looks good
[root@opensourceecology ~]# sudo -u b2user /home/b2user/virtualenv/bin/b2 ls ose-server-backups | grep `date "+%Y%m%d"` daily_hetzner2_20240726_072001.tar.gpg [root@opensourceecology ~]# date -u Fri Jul 26 16:03:55 UTC 2024 [root@opensourceecology ~]# sudo -u b2user /home/b2user/virtualenv/bin/b2 ls ose-server-backups daily_hetzner2_20240724_072001.tar.gpg daily_hetzner2_20240725_072001.tar.gpg daily_hetzner2_20240726_072001.tar.gpg monthly_hetzner2_20230801_072001.tar.gpg monthly_hetzner2_20230901_072001.tar.gpg monthly_hetzner2_20231001_072001.tar.gpg monthly_hetzner2_20231101_072001.tar.gpg monthly_hetzner2_20231201_072001.tar.gpg monthly_hetzner2_20240201_072001.tar.gpg monthly_hetzner2_20240301_072001.tar.gpg monthly_hetzner2_20240401_072001.tar.gpg monthly_hetzner2_20240501_072001.tar.gpg monthly_hetzner2_20240601_072001.tar.gpg monthly_hetzner2_20240701_072001.tar.gpg weekly_hetzner2_20240708_072001.tar.gpg weekly_hetzner2_20240715_072001.tar.gpg weekly_hetzner2_20240722_072001.tar.gpg yearly_hetzner2_20190101_111520.tar.gpg yearly_hetzner2_20200101_072001.tar.gpg yearly_hetzner2_20210101_072001.tar.gpg yearly_hetzner2_20230101_072001.tar.gpg yearly_hetzner2_20240101_072001.tar.gpg [root@opensourceecology ~]#
- that backup is already 8 hours old; so let's bring down the webserver + stop the databases and take a real fresh backup before we do anything
- stopped nginx
[root@opensourceecology ~]# # create dir for logging the change [root@opensourceecology ~]# tmpDir="/var/tmp/CHG-2024-07-26_yum_update" [root@opensourceecology ~]# mkdir -p $tmpDir [root@opensourceecology ~]# [root@opensourceecology ~]# # begin to gracefully shutdown nginx in the background [root@opensourceecology ~]# time nice /sbin/nginx -s quit nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11 real 0m0.078s user 0m0.038s sys 0m0.021s [root@opensourceecology ~]# [root@opensourceecology ~]# date -u Fri Jul 26 16:06:37 UTC 2024 [root@opensourceecology ~]#
- stopped DBs
[root@opensourceecology ~]# systemctl status mariadb ● mariadb.service - MariaDB database server Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2024-07-22 18:55:28 UTC; 3 days ago Process: 1230 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS) Process: 1099 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS) Main PID: 1229 (mysqld_safe) CGroup: /system.slice/mariadb.service ├─1229 /bin/sh /usr/bin/mysqld_safe --basedir=/usr └─1704 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql ... Jul 22 18:55:25 opensourceecology.org systemd[1]: Starting MariaDB database s.... Jul 22 18:55:26 opensourceecology.org mariadb-prepare-db-dir[1099]: Database M... Jul 22 18:55:26 opensourceecology.org mariadb-prepare-db-dir[1099]: If this is... Jul 22 18:55:26 opensourceecology.org mysqld_safe[1229]: 240722 18:55:26 mysql... Jul 22 18:55:26 opensourceecology.org mysqld_safe[1229]: 240722 18:55:26 mysql... Jul 22 18:55:28 opensourceecology.org systemd[1]: Started MariaDB database se.... Hint: Some lines were ellipsized, use -l to show in full. [root@opensourceecology ~]# systemctl stop mariadb [root@opensourceecology ~]# systemctl status mariadb ● mariadb.service - MariaDB database server Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled) Active: inactive (dead) since Fri 2024-07-26 16:07:43 UTC; 3s ago Process: 1230 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS) Process: 1229 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS) Process: 1099 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS) Main PID: 1229 (code=exited, status=0/SUCCESS) Jul 22 18:55:25 opensourceecology.org systemd[1]: Starting MariaDB database s.... Jul 22 18:55:26 opensourceecology.org mariadb-prepare-db-dir[1099]: Database M... Jul 22 18:55:26 opensourceecology.org mariadb-prepare-db-dir[1099]: If this is... Jul 22 18:55:26 opensourceecology.org mysqld_safe[1229]: 240722 18:55:26 mysql... Jul 22 18:55:26 opensourceecology.org mysqld_safe[1229]: 240722 18:55:26 mysql... Jul 22 18:55:28 opensourceecology.org systemd[1]: Started MariaDB database se.... Jul 26 16:07:40 opensourceecology.org systemd[1]: Stopping MariaDB database s.... Jul 26 16:07:43 opensourceecology.org systemd[1]: Stopped MariaDB database se.... Hint: Some lines were ellipsized, use -l to show in full. [root@opensourceecology ~]#
- the backup is taking a long time. while I wait, I checked `top`, and I see `gzip` is using 80%-100%
- so it seems that gzip is bound by a single core. it could go much faster if it could be split across multiple cores (parallel processing)
- quick googling while I wait suggests that we could use `pigz` as a replacement to `gzip` to get this (admittedly low priority) performance boost https://stackoverflow.com/questions/12313242/utilizing-multi-core-for-targzip-bzip-compression-decompression
- there's other options too. apparently xz has native multi-treadded support since v5.2.0 https://askubuntu.com/a/858828
- there's also pbzip2 for bzip, and many others https://askubuntu.com/a/258228
- the other two commands that get stuck on one-core are `tar` and `gpg2`
- it looks like gpg also attempts to compress with xz. That gives us no benefits in our case because we're encrypting a tarball that just contains a bunch of already-compressed tarballs. So we could probably get some performance improvements by telling gpg to skip the xz compression with `--compression-algo none` https://stackoverflow.com/questions/46261024/how-to-do-large-file-parallel-encryption-using-gnupg-and-gnu-parallel
- finally (after ~30 min to generate the encrypted backup file), rclone is using >100% of CPU to upload it, so that's good. Our script does limit upload to 3 MB/s. I guess one improvement would be some argument to bypass that throttle
- it said the upload was going to take just under 2 hours, so I canceled it and manually ran the upload command (minus the throttle)
- upload speeds are now ~27-32 MB/s (so ~10x faster). It says it'll finish in just over 10 minutes.
- upload is done
[root@opensourceecology ~]# time sudo /bin/nice /root/backups/backup.sh &>> /var/log/backups/backup.log ^C real 33m47.250s user 23m56.551s sys 2m2.866s [root@opensourceecology ~]# [root@opensourceecology ~]# /bin/sudo -u b2user /bin/rclone -v copy /home/b2user/sync/daily_hetzner2_20240726_160837.tar.gpg b2:ose-server-backups ... 2024/07/26 16:56:38 INFO : Transferred: 18.440G / 19.206 GBytes, 96%, 22.492 MBytes/s, ETA 34s Transferred: 0 / 1, 0% Elapsed time: 14m0.5s Transferring: * daily_hetzner2_20240726_160837.tar.gpg: 96% /19.206G, 21.268M/s, 36s 2024/07/26 16:57:36 INFO : daily_hetzner2_20240726_160837.tar.gpg: Copied (new) 2024/07/26 16:57:36 INFO : Transferred: 19.206G / 19.206 GBytes, 100%, 21.910 MBytes/s, ETA 0s Transferred: 1 / 1, 100% Elapsed time: 14m58.6s [root@opensourceecology ~]#
- ok, this very durable backup is uploaded; let's proceed
[root@opensourceecology ~]# sudo -u b2user /home/b2user/virtualenv/bin/b2 ls ose-server-backups | grep `date "+%Y%m%d"` daily_hetzner2_20240726_072001.tar.gpg daily_hetzner2_20240726_160837.tar.gpg [root@opensourceecology ~]# date -u Fri Jul 26 16:58:11 UTC 2024 [root@opensourceecology ~]# sudo -u b2user /home/b2user/virtualenv/bin/b2 ls ose-server-backups daily_hetzner2_20240724_072001.tar.gpg daily_hetzner2_20240725_072001.tar.gpg daily_hetzner2_20240726_072001.tar.gpg daily_hetzner2_20240726_160837.tar.gpg monthly_hetzner2_20230801_072001.tar.gpg monthly_hetzner2_20230901_072001.tar.gpg monthly_hetzner2_20231001_072001.tar.gpg monthly_hetzner2_20231101_072001.tar.gpg monthly_hetzner2_20231201_072001.tar.gpg monthly_hetzner2_20240201_072001.tar.gpg monthly_hetzner2_20240301_072001.tar.gpg monthly_hetzner2_20240401_072001.tar.gpg monthly_hetzner2_20240501_072001.tar.gpg monthly_hetzner2_20240601_072001.tar.gpg monthly_hetzner2_20240701_072001.tar.gpg weekly_hetzner2_20240708_072001.tar.gpg weekly_hetzner2_20240715_072001.tar.gpg weekly_hetzner2_20240722_072001.tar.gpg yearly_hetzner2_20190101_111520.tar.gpg yearly_hetzner2_20200101_072001.tar.gpg yearly_hetzner2_20210101_072001.tar.gpg yearly_hetzner2_20230101_072001.tar.gpg yearly_hetzner2_20240101_072001.tar.gpg [root@opensourceecology ~]#
- we have a snapshot of the current state of packages
[root@opensourceecology ~]# time nice rpm -qa &> "${tmpDir}/before.log" real 0m0.716s user 0m0.678s sys 0m0.037s [root@opensourceecology ~]# [root@opensourceecology ~]# echo $tmpDir /var/tmp/CHG-2024-07-26_yum_update [root@opensourceecology ~]# [root@opensourceecology ~]# tail /var/tmp/CHG-2024-07-26_yum_update/before.log libdb-utils-5.3.21-25.el7.x86_64 libuser-0.60-9.el7.x86_64 python-lxml-3.2.1-4.el7.x86_64 net-snmp-agent-libs-5.7.2-48.el7_8.x86_64 epel-release-7-14.noarch perl-parent-0.225-244.el7.noarch libstdc++-devel-4.8.5-39.el7.x86_64 libsodium13-1.0.5-1.el7.x86_64 ncurses-5.9-14.20130511.el7_4.x86_64 e2fsprogs-libs-1.42.9-17.el7.x86_64 [root@opensourceecology ~]#
- I kicked-off the updates. I got a bit of a freight at first when we got "404 Not Found" errors from 484 mirrors, but eventually `yum` found a server. I'm glad we did the updates now, before all the mirrors shutdown (centos was EOL some years ago, and will no longer be getting maintenance updates as of a few weeks ago)
[root@opensourceecology ~]# grep "Error 404" /var/tmp/CHG-2024-07-26_yum_update/update.log | wc -l 484 [root@opensourceecology ~]# [root@opensourceecology ~]# cat /etc/centos-release CentOS Linux release 7.9.2009 (Core) [root@opensourceecology ~]#
- actually, it says it's updating 434 packages total. So I guess some dependencies got added to the 200-odd count before
- ok, the update command finished in just under 4 minutes of wall time
... real 3m56.410s user 2m1.833s sys 0m44.510s [root@opensourceecology ~]#
- post update info
[root@opensourceecology ~]# # log the post-state packages and versions [root@opensourceecology ~]# time nice rpm -qa &> "${tmpDir}/after.log" real 0m0.805s user 0m0.769s sys 0m0.036s [root@opensourceecology ~]# [root@opensourceecology ~]# time nice needs-restarting &> "${tmpDir}/needs-restarting.log" real 0m8.156s user 0m6.956s sys 0m0.652s [root@opensourceecology ~]# time nice needs-restarting -r &> "${tmpDir}/needs-reboot.log" real 0m0.155s user 0m0.104s sys 0m0.051s [root@opensourceecology ~]# [root@opensourceecology ~]# cat /var/tmp/CHG-2024-07-26_yum_update/needs-reboot.log Core libraries or services have been updated: systemd -> 219-78.el7_9.9 dbus -> 1:1.10.24-15.el7 openssl-libs -> 1:1.0.2k-26.el7_9 linux-firmware -> 20200421-83.git78c0348.el7_9 kernel -> 3.10.0-1160.119.1.el7 glibc -> 2.17-326.el7_9.3 Reboot is required to ensure that your system benefits from these updates. More information: https://access.redhat.com/solutions/27943 [root@opensourceecology ~]# [root@opensourceecology ~]# cat /var/tmp/CHG-2024-07-26_yum_update/needs-restarting.log 30842 : /usr/lib/systemd/systemd-udevd 13696 : sshd: maltfield@pts/0 27401 : /bin/bash 744 : /sbin/auditd 19086 : /bin/bash 13692 : sshd: maltfield [priv] 30672 : smtpd -n smtp -t inet -u 13699 : -bash 18035 : su - 27436 : less /root/backups/backup.sh 18036 : -bash 18030 : sudo su - 1484 : /var/ossec/bin/ossec-analysisd 24493 : /bin/bash 21581 : su - 21580 : sudo su - 21582 : -bash 797 : /usr/lib/systemd/systemd-logind 24476 : /bin/bash 1830 : qmgr -l -t unix -u 30673 : proxymap -t unix -u 19119 : sudo su - 24511 : /bin/bash 29833 : local -t unix 27417 : sudo su - 19130 : -bash 1 : /usr/lib/systemd/systemd --system --deserialize 23 29830 : cleanup -z -t unix -u 1500 : /var/ossec/bin/ossec-logcollector 24475 : SCREEN -S upgrade 2150 : /usr/sbin/varnishd -P /var/run/varnish.pid -f /etc/varnish/default.vcl -a 127.0.0.1:6081 -T 127.0.0.1:6082 -S /etc/varnish/secret -u varnish -g varnish -s malloc,40G 2152 : /usr/sbin/varnishd -P /var/run/varnish.pid -f /etc/varnish/default.vcl -a 127.0.0.1:6081 -T 127.0.0.1:6082 -S /etc/varnish/secret -u varnish -g varnish -s malloc,40G 29835 : bounce -z -t unix -u 775 : /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation 27419 : -bash 585 : /usr/lib/systemd/systemd-journald 771 : /usr/sbin/irqbalance --foreground 770 : /usr/sbin/acpid 1170 : /sbin/agetty --noclear tty1 linux 30690 : smtp -t unix -u 778 : /usr/sbin/chronyd 8695 : gpg-agent --daemon --use-standard-socket 24529 : /bin/bash 2121 : /var/ossec/bin/ossec-syscheckd 1806 : /usr/libexec/postfix/master -w 19129 : su - 19065 : /bin/bash 2124 : /var/ossec/bin/ossec-monitord 29832 : trivial-rewrite -n rewrite -t unix -u 19044 : /bin/bash 30693 : smtp -t unix -u 30692 : smtp -t unix -u 30691 : cleanup -z -t unix -u 27418 : su - 1475 : /var/ossec/bin/ossec-execd 19025 : /bin/bash 19024 : SCREEN -S CHG-2024-07-26_yum_update 1458 : /var/ossec/bin/ossec-maild 19023 : screen -S CHG-2024-07-26_yum_update [root@opensourceecology ~]#
- alright, time to reboot
[root@opensourceecology ~]# reboot Connection to opensourceecology.org closed by remote host. Connection to opensourceecology.org closed. user@personal:~$
- system came back in about 1 minute
- first attempt to load the wiki resulted in a 503 "Error 503 Backend fetch failed" from varnish
- it's not just warming-up; apache didn't come-up on start
[root@opensourceecology ~]# systemctl status httpd ● httpd.service - The Apache HTTP Server Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Fri 2024-07-26 17:09:47 UTC; 2min 7s ago Docs: man:httpd(8) man:apachectl(8) Process: 1094 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE) Main PID: 1094 (code=exited, status=1/FAILURE) Jul 26 17:09:47 opensourceecology.org systemd[1]: Starting The Apache HTTP Se.... Jul 26 17:09:47 opensourceecology.org httpd[1094]: (98)Address already in use:... Jul 26 17:09:47 opensourceecology.org httpd[1094]: (98)Address already in use:... Jul 26 17:09:47 opensourceecology.org httpd[1094]: no listening sockets availa... Jul 26 17:09:47 opensourceecology.org httpd[1094]: AH00015: Unable to open logs Jul 26 17:09:47 opensourceecology.org systemd[1]: httpd.service: main process...E Jul 26 17:09:47 opensourceecology.org systemd[1]: Failed to start The Apache .... Jul 26 17:09:47 opensourceecology.org systemd[1]: Unit httpd.service entered .... Jul 26 17:09:47 opensourceecology.org systemd[1]: httpd.service failed. Hint: Some lines were ellipsized, use -l to show in full. [root@opensourceecology ~]#
- it says that the port is already in-use
[root@opensourceecology ~]# journalctl -u httpd --no-pager -- Logs begin at Fri 2024-07-26 17:09:34 UTC, end at Fri 2024-07-26 17:15:26 UTC. -- Jul 26 17:09:47 opensourceecology.org systemd[1]: Starting The Apache HTTP Server... Jul 26 17:09:47 opensourceecology.org httpd[1094]: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:443 Jul 26 17:09:47 opensourceecology.org httpd[1094]: (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:443 Jul 26 17:09:47 opensourceecology.org httpd[1094]: no listening sockets available, shutting down Jul 26 17:09:47 opensourceecology.org httpd[1094]: AH00015: Unable to open logs Jul 26 17:09:47 opensourceecology.org systemd[1]: httpd.service: main process exited, code=exited, status=1/FAILURE Jul 26 17:09:47 opensourceecology.org systemd[1]: Failed to start The Apache HTTP Server. Jul 26 17:09:47 opensourceecology.org systemd[1]: Unit httpd.service entered failed state. Jul 26 17:09:47 opensourceecology.org systemd[1]: httpd.service failed. [root@opensourceecology ~]#
- before I start making changes, I'm going to initiate another backup (and wait at-least 30 minutes for the tar to finish)
- I'm going to want to diff the apache configs, so I made a copy of the backup that I made just before the updates and copied it into the temp CHG dir
[root@opensourceecology CHG-2024-07-26_yum_update]# mkdir backup_before [root@opensourceecology CHG-2024-07-26_yum_update]# rsync -av --progress /home/b2user/sync.old/daily_hetzner2_20240726_160837.tar.gpg backup_before/ sending incremental file list daily_hetzner2_20240726_160837.tar.gpg 20,622,312,871 100% 127.14MB/s 0:02:34 (xfr#1, to-chk=0/1) sent 20,627,347,744 bytes received 35 bytes 133,510,341.61 bytes/sec total size is 20,622,312,871 speedup is 1.00 [root@opensourceecology CHG-2024-07-26_yum_update]#
- well, unfortunately the wiki being down means I can't reference our docs on how to restore backups, but I managed to figure it out
[root@opensourceecology backup_before]# gpg --batch --passphrase-file /root/backups/ose-backups-cron.key --decrypt daily_hetzner2_20240726_160837.tar.gpg > daily_hetzner2_20240726_160837.tar gpg: AES256 encrypted data gpg: encrypted with 1 passphrase [root@opensourceecology backup_before]# [root@opensourceecology backup_before]# du -sh * 20G daily_hetzner2_20240726_160837.tar 20G daily_hetzner2_20240726_160837.tar.gpg [root@opensourceecology backup_before]# [root@opensourceecology backup_before]# du -sh * 20G daily_hetzner2_20240726_160837.tar 20G daily_hetzner2_20240726_160837.tar.gpg [root@opensourceecology backup_before]# [root@opensourceecology backup_before]# tar -xf daily_hetzner2_20240726_160837.tar [root@opensourceecology backup_before]# [root@opensourceecology backup_before]# du -sh * 20G daily_hetzner2_20240726_160837.tar 20G daily_hetzner2_20240726_160837.tar.gpg 20G root [root@opensourceecology backup_before]# rm -f daily_hetzner2_20240726_160837.tar.gpg [root@opensourceecology backup_before]# [root@opensourceecology backup_before]# du -sh * 20G daily_hetzner2_20240726_160837.tar 20G root [root@opensourceecology backup_before]#
- to make this easier for the next person, I created a README directly in the backups dir
[root@opensourceecology backups]# cat /root/backups/README.txt 2024-07-26 ========== The process to restore from backups is documented on the wiki * https://wiki.opensourceecology.org/wiki/Backblaze#Restore_from_backups Oh, the wiki is down and you need to restore from backups to restore the wiki? Don't worry, I got you. All backups are stored on Backblaze B2. You can download them with rclone or just by logging into the Backblaze B2 WUI. First decrypt the main wrapper tar with `gpg` gpg --batch --passphrase-file <path-to-symmetric-encrypton-private-key> --decrypt <path-to-encrypted-tarball> > <path-to-decrypted-tarball> For example: gpg --batch --passphrase-file /root/backups/ose-backups-cron.key --decrypt daily_hetzner2_20240726_160837.tar.gpg > daily_hetzner2_20240726_160837.tar Then you can untar the wrapper tarball and the compressed tarball inside of that. For example: tar -xf daily_hetzner2_20240726_160837.tar cd root/backups/sync/daily_hetzner2_20240726_160837/www/ tar -xf www.20240726_160837.tar.gz head var/www/html/www.opensourceecology.org/htdocs/index.php --Michael Altfield <https://michaelaltfield.net.> [root@opensourceecology backups]#
- and I was able to extract the www files from the backups prior to the update
[root@opensourceecology backup_before]# cd root/backups/sync/daily_hetzner2_20240726_160837/www/ [root@opensourceecology www]# [root@opensourceecology www]# ls www.20240726_160837.tar.gz [root@opensourceecology www]# [root@opensourceecology www]# tar -xf www.20240726_160837.tar.gz [root@opensourceecology www]# [root@opensourceecology www]# du -sh * 32G var 19G www.20240726_160837.tar.gz [root@opensourceecology www]#
- oh, actually I want the /ettc/ config file
[root@opensourceecology www]# cd ../etc [root@opensourceecology etc]# [root@opensourceecology etc]# tar -xf etc.20240726_160837.tar.gz [root@opensourceecology etc]# [root@opensourceecology etc]# du -sh * 46M etc 13M etc.20240726_160837.tar.gz [root@opensourceecology etc]#
- a diff of the pre-update configs and the current configs shows 4x new files
[root@opensourceecology etc]# diff -ril etc/httpd /etc/httpd diff: etc/httpd/logs: No such file or directory diff: etc/httpd/modules: No such file or directory Only in /etc/httpd/conf.d: autoindex.conf Only in /etc/httpd/conf.d: ssl.conf Only in /etc/httpd/conf.d: userdir.conf Only in /etc/httpd/conf.d: welcome.conf [root@opensourceecology etc]#
- I just moved these 4x files out (into our tmp change dir), and tried a restart; it came up
[root@opensourceecology CHG-2024-07-26_yum_update]# mkdir moved_from_etc_httpd [root@opensourceecology CHG-2024-07-26_yum_update]# mv /etc/httpd/conf.d/autoindex.conf moved_from_etc_httpd/ [root@opensourceecology CHG-2024-07-26_yum_update]# mv /etc/httpd/conf.d/ssl.conf moved_from_etc_httpd/ [root@opensourceecology CHG-2024-07-26_yum_update]# mv /etc/httpd/conf.d/userdir.conf moved_from_etc_httpd/ [root@opensourceecology CHG-2024-07-26_yum_update]# mv /etc/httpd/conf.d/welcome.conf moved_from_etc_httpd/ [root@opensourceecology CHG-2024-07-26_yum_update]# [root@opensourceecology CHG-2024-07-26_yum_update]# systemctl restart httpd [root@opensourceecology CHG-2024-07-26_yum_update]# [root@opensourceecology CHG-2024-07-26_yum_update]# systemctl status httpd ● httpd.service - The Apache HTTP Server Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2024-07-26 17:59:36 UTC; 4s ago Docs: man:httpd(8) man:apachectl(8) Main PID: 15908 (httpd) Status: "Processing requests..." CGroup: /system.slice/httpd.service ├─15908 /usr/sbin/httpd -DFOREGROUND ├─15910 /usr/sbin/httpd -DFOREGROUND ├─15911 /usr/sbin/httpd -DFOREGROUND ├─15912 /usr/sbin/httpd -DFOREGROUND ├─15913 /usr/sbin/httpd -DFOREGROUND ├─15914 /usr/sbin/httpd -DFOREGROUND ├─15921 /usr/sbin/httpd -DFOREGROUND ├─15927 /usr/sbin/httpd -DFOREGROUND ├─15928 /usr/sbin/httpd -DFOREGROUND ├─15936 /usr/sbin/httpd -DFOREGROUND ├─15937 /usr/sbin/httpd -DFOREGROUND ├─15938 /usr/sbin/httpd -DFOREGROUND └─15939 /usr/sbin/httpd -DFOREGROUND Jul 26 17:59:36 opensourceecology.org systemd[1]: Starting The Apache HTTP Se.... Jul 26 17:59:36 opensourceecology.org systemd[1]: Started The Apache HTTP Server. Hint: Some lines were ellipsized, use -l to show in full. [root@opensourceecology CHG-2024-07-26_yum_update]#
- I was able to load and edit the wiki; I spent some time adding some updates to the CHG article https://wiki.opensourceecology.org/wiki/CHG-2020-05-04_yum_update
- for some reason my browser keeps locking-up when all I'm trying to do is edit the text in the textarea for ^ this wiki article. I don't use the wysiwyg editor. I'm literally just editing text in a textarea; that shouldn't require any processing
- It took me ~20 minutes just to make a few changes to one wiki article because the page on firefox kept locking-up, sometimes displaying a spinning circle over the page
- I launched a new DispVM with *only* firefox running and *only* one tab open in firefox. The issue persisted, and the VM with the (idle) firefox on the edit page was taxed with 20-60% CPU usage; something is definitely wrong, but it's unclear if the bug is on our mediawiki server, my firefox client, or both
- anyway, I'm continuing with the validation steps
- I was successfully able to load the frontpage of all the 9x websites
- the logo at the top (and bottom) of https://oswh.opensourceecology.org/ was missing, but I'm not sure if that was the case before the updates or not
- I simply get a 404 on the image http://www.opensourcewarehouse.org/wp-content/uploads/2013/02/headfooter-logonew.png
- I guess the domain is wrong; we don't appear to use opensourcewarehouse.org anymore, so I guess this was an issue that predates our updates now
- everything else looked good
- the logo at the top (and bottom) of https://oswh.opensourceecology.org/ was missing, but I'm not sure if that was the case before the updates or not
- I logged into the munin. It loads fine
- I do see some gaps in mysql charts where everything drops to 0 for a few hours, which I guess is before/why Marcin did reboots again. My job now isn't to investigate this now, but I'm just making a note here
- otherwise munin is working; validated.
- I logged into awstates. it loads fine
- I just quickly scanned the main pages for www.opensourceecology.org and wiki.opensourceecology.org; they look fine
- I already tested edits on wiki.opensourceecolgy.org; they're working (setting aside the client-side lag)
- I was successfully able to make a trivial change to the main wordpress site, and then revert that change https://www.opensourceecology.org/offline-wiki-zim-kiwix/
- the only thing left is the backups, which have been running the background since shortly after the reboot
- the backups finished being created successfully
- the backups are currently being uploaded at the rate-limited 3 MB/s. they're at 39% now, and estimated to finish uploading in 1h10m from now.
- the upload is the last step; that's good enough for me to consider the backups functional
- that completes our validation; I think it's safe to mark this change as successful
- I sent an update email to Marcin & Catarina
Hey Marcin & Catarina, I've finished updating the system packages on the hetzner2 server. It's a very good thing that we did this, because your server tried and failed to download its updates from 484 mirrors before it finally found a server that it could download its updates from. As I mentioned in Nov 2022, your server runs CentOS 7, which stopped receiving "Full Updates" by Red Hat in Aug 2020. As of Jun 2024, it is no longer going to be updated in any way (security, maintenance, etc). At some point in the future, I guess all of their update servers will go down too. We're lucky at least one was still online. * https://wiki.opensourceecology.org/wiki/Maltfield_Log/2022#Wed_November_02.2C_2022 Today I was successfully able to update 434 system packages onto hetzner2. I did some quick validation of a subset of your websites, and I only found a couple minor errors 1. The header & footer images of oswh don't load https://oswh.opensourceecology.org/ 2. Editing the wiki sometimes causes my browser to lock-up; it's not clear if this is a pre-existing issue, or if the issue is caused by your server or my client I did not update your server's applications that cannot be updated by the package manager (eg wordpress, mediawiki, etc). If you don't detect any issues with your server, then I would recommend that we do the application upgrade simultaneously with a migration to a new server running Debian. I'd like to stress again the urgency of the need to migrate off of CentOS 7. Besides the obvious security risks of running a server that is no longer receiving security patches, at some point in the likely-not-too-distant future, your server is going to break and it will be extremely non-trivial to fix it. The deadline for migrating was in 2020. I highly recommend prioritizing a project to migrate your server to a new Debian server ASAP. Please spend some time testing your various websites, and let me know if you experience any issues. Thank you, Michael Altfield Senior Technology Advisor PGP Fingerprint: 8A4B 0AF8 162F 3B6A 79B7 70D2 AA3E DF71 60E2 D97B Open Source Ecology www.opensourceecology.org
- I confirmed the list of updates on the server is now empty
[root@opensourceecology CHG-2024-07-26_yum_update]# yum list updates Loaded plugins: fastestmirror, replace Loading mirror speeds from cached hostfile * base: ftp.plusline.net * epel: mirrors.n-ix.net * extras: ftp.plusline.net * updates: mirror.checkdomain.de [root@opensourceecology CHG-2024-07-26_yum_update]#
- I'm considering the change successful
- looks like my tmp change dir pushed the disk to 86% capacity; let's clean that up
[root@opensourceecology CHG-2024-07-26_yum_update]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 32G 0 32G 0% /dev tmpfs 32G 0 32G 0% /dev/shm tmpfs 32G 17M 32G 1% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/md2 197G 161G 27G 86% / /dev/md1 488M 386M 77M 84% /boot tmpfs 6.3G 0 6.3G 0% /run/user/1005 [root@opensourceecology CHG-2024-07-26_yum_update]# ls after.log before.log needs-reboot.log update.log backup_before moved_from_etc_httpd needs-restarting.log [root@opensourceecology CHG-2024-07-26_yum_update]# du -sh * 28K after.log 70G backup_before 28K before.log 28K moved_from_etc_httpd 4.0K needs-reboot.log 4.0K needs-restarting.log 216K update.log [root@opensourceecology CHG-2024-07-26_yum_update]# ls before.log before.log [root@opensourceecology CHG-2024-07-26_yum_update]# [root@opensourceecology CHG-2024-07-26_yum_update]# ls backup_before/ daily_hetzner2_20240726_160837.tar root [root@opensourceecology CHG-2024-07-26_yum_update]# [root@opensourceecology CHG-2024-07-26_yum_update]# du -sh backup_before/* 20G backup_before/daily_hetzner2_20240726_160837.tar 51G backup_before/root [root@opensourceecology CHG-2024-07-26_yum_update]# [root@opensourceecology CHG-2024-07-26_yum_update]# ls backup_before/root backups [root@opensourceecology CHG-2024-07-26_yum_update]# [root@opensourceecology CHG-2024-07-26_yum_update]# rm -rf backup_before/root [root@opensourceecology CHG-2024-07-26_yum_update]#
- great, now we're down to 59%
[root@opensourceecology CHG-2024-07-26_yum_update]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 32G 0 32G 0% /dev tmpfs 32G 0 32G 0% /dev/shm tmpfs 32G 17M 32G 1% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/md2 197G 110G 78G 59% / /dev/md1 488M 386M 77M 84% /boot tmpfs 6.3G 0 6.3G 0% /run/user/1005 [root@opensourceecology CHG-2024-07-26_yum_update]#
Wed July 24, 2024
- Marcin contacted me a few days ago saying that the server needs reboots again
- I found that the last time we did a system packages update was in 2020, over 4 years ago. I strongly recommended that we update the system packages, and probably the web applications as well
- here's the link to the last time I updated the system packages in May 2020 https://wiki.opensourceecology.org/wiki/CHG-2020-05-04_yum_update
- I also noted that CentOS is now not only EOL, but it's also no longer receiving (security) updates.
- I warned Marcin about this approaching deadline in Nov 2022, and urged him to migrate to a new OS before 2024.
In my prior work at OSE, I've done my best to design your systems to be robust and "well oiled" so that they would run for as long as possible with as little maintenance as possible. However, code rots over time, and there's only so long you can hold-off before things fall apart. Python 2.7.5 was End-of-Life'd on 2020-01-01, and it no longer receives any updates. * https://en.wikipedia.org/wiki/History_of_Python CentOS 7.7 was released 2019-09-17. "Full Updates" stopped 2020-08-06, and it will no longer receive any maintenance updates after 2024-06-30. * https://wiki.centos.org/About/Product At some point, you're going to want to migrate to a new server with a new OS. I strongly recommend initiating this project before 2024.
- Here's the log entry https://wiki.opensourceecology.org/wiki/Maltfield_Log/2022#Wed_November_02.2C_2022
- I told Marcin to budget for ~$10,000 to migrate to a new server, as it's going to be a massive project that will likely require more than a month of full-time work to complete the migration
- Marcin said I should go ahead and prepare a CHG "ticket" article for the upgrade and schedule a time to do it
- I prepared a change ticket for updating the system packages on Friday https://wiki.opensourceecology.org/wiki/CHG-2024-07-26_yum_update
- I also noticed that I kept getting de-auth'd every few minutes on the wiki. That's annoying. Hopefully updates will help this (and other) issues go away.
- If we did a migration to debian, then we'd need to migrate to a new server
- previously when we migrated from hetzner1 to hetzner2, we got a 15x increase in RAM (from 4GB to 64GB). And the price of both servers was the same!
- I was expecting the next jump would have similar results: we'd migrate to a new server that costs the same for much better specs, but that's not looking like it's going to be the case :(
- Here's the currently-offered dedicated servers at hetzner https://www.hetzner.com/dedicated-rootserver/
- Currently we have 8-cores, 64G RAM, and two 250G disks in a RAID-1 software array. We pay 39 EUR/mo
- The cheapest dedicated server (EX44) currently is 46.41 EUR/month and comes with 14-cores, 64G RAM, and 2x 512G disks. That should meet our requirements https://www.hetzner.com/dedicated-rootserver/ex44/configurator/#/
- oh crap, we'd be downgrading the proc from the i7 (Intel® Core™ i7-6700) to an i5 (Intel® Core™ i5-13500)
- I'd have to check the munin charts, but I would be surprised if we ever break a load of 2, so that's still probably fine.
- I met with Marcin tonight to discuss [a] the system-level package upgrades, [b] the application (eg wordpress, mediawiki, etc) upgrades, and [c] the server migration
- I recommended that Marcin do the updates on staging, and described the risk of not doing it
- the problem is that the current staging environment is down, an d it may take a few days to restore it
- the risk is maybe a few days of downtime instead smaller change window during the update
- we agreed that I'll do the system-level package upgrades direct-to-production; Marcin accepted the risk of a few days of downtime
- Marcin also mentioned that Hetzner has a "server auction" page, which has some more servers that meet our needs at a slightly discounted price https://www.hetzner.com/sb/
- actually many of these are 37.72 EUR/mo, so they're actually *cheaper* than our current 39 EUR/mo. Great!
- there's >3 pages of servers for this 37.72 EUR/mo price. One of them has 2x 4TB drives (though it looks like spinning disks). This is a server graveyard built-to-spec for previous customers, it seems. We should be able to find one that meets our needs, so that means we'll easily double our disk and save ~15 EUR per year. Cool :)
- I recommended that Marcin do the updates on staging, and described the risk of not doing it